refactor(oi): improve data extraction and consolidate documentation

- Fix MQL5 API usage in EA to use correct CopyRates and POSITION_TYPE enums
- Refactor scraper data extraction to use drop_duplicates for unique strikes
- Consolidate Windows setup guide into main README
- Add virtual environment batch files for easier setup and execution
- Simplify run_scraper.bat to focus on core execution
- Normalize lot calculation to use SymbolInfo.LotsStep()
This commit is contained in:
Kunthawat Greethong
2026-01-06 20:18:12 +07:00
parent 2e8e07ed17
commit b7c0e68fa8
8 changed files with 386 additions and 895 deletions

View File

@@ -1,179 +1,268 @@
# CME OI Scraper
Python scraper to pull Open Interest data from CME Group QuikStrike and current gold price from investing.com.
Python scraper that extracts Open Interest data from CME Group QuikStrike and current gold price from investing.com.
## What It Extracts
1. **OI Levels (from CME QuikStrike):**
- Top 3 CALL strikes by OI volume
- Top 3 PUT strikes by OI volume
- Top 3 CALL strikes by OI volume (unique strikes)
- Top 3 PUT strikes by OI volume (unique strikes)
2. **Gold Price (from investing.com):**
- Current gold futures price (e.g., 4345.50)
- Current gold futures price (e.g., 4476.50)
## Prerequisites
- Python 3.9+
- CME Group QuikStrike account with login credentials
- Python 3.9 or higher
- CME Group QuikStrike account (free registration at https://www.cmegroup.com)
- Windows 10/11 (for batch files) or Linux/macOS
## Installation
## Quick Start
1. Copy environment variables:
```bash
cp .env.example .env
```
### Windows
2. Edit `.env` and add your CME credentials:
```bash
CME_USERNAME=your_username
CME_PASSWORD=your_password
```
1. **Run one-time setup:**
```cmd
cd C:\Path\To\oi_scraper
setup_env.bat
```
3. Install dependencies:
```bash
pip install -r requirements.txt
playwright install chromium
```
2. **Run the scraper:**
```cmd
run_with_venv.bat
```
## Usage
### Linux/macOS
### Basic Scraping
1. **Setup:**
```bash
cd /path/to/oi_scraper
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium
```
```bash
python main.py
```
This will:
- Login to CME QuikStrike
- Navigate to OI Heatmap
- Extract top 3 CALL and PUT strikes by OI volume
- Scrape current gold price from investing.com
- Export to `oi_data.csv`
### Session Persistence
The scraper automatically saves your login session to `cookies.json`. This means:
- **First run**: Logs in with your credentials, saves cookies
- **Subsequent runs**: Uses saved cookies if session is still valid
- **Session expired**: Automatically logs in again and saves new cookies
Benefits for scheduled runs:
- Faster execution (skips login when session is valid)
- Reduces login attempts to CME servers
- CME sessions typically last several days/weeks
To force a fresh login, delete `cookies.json`:
```bash
rm cookies.json
```
### Output Format
The CSV output is compatible with the EA's `LoadOIFromCSV()` and `LoadFuturePriceFromCSV()` functions:
```csv
Type,Strike,OI
CALL,4345,155398
CALL,4350,229137
CALL,4360,90649
PUT,4300,227936
PUT,4290,270135
PUT,4280,65839
[Price]
FuturePrice,4345.50
```
**Note:** The `[Price]` section contains the current gold futures price scraped from investing.com. The EA reads this value for Delta calculation.
2. **Run:**
```bash
source venv/bin/activate
python main.py
```
## Configuration
Edit `.env` to customize:
### Edit `.env` File
- `PRODUCT_URL` - QuikStrike product page URL (requires login)
- `CME_LOGIN_URL` - CME login page URL (default: SSO URL)
- `TOP_N_STRIKES` - Number of top strikes to export (default: 3)
- `HEADLESS` - Run browser in headless mode (default: false for debugging)
- `CSV_OUTPUT_PATH` - Output CSV file path
- `TIMEOUT_SECONDS` - Page load timeout
Copy and edit the environment file:
### Available Products
**Gold (XAUUSD/COMEX Gold - OG|GC):**
```
PRODUCT_URL=https://cmegroup.quikstrike.net/User/QuikStrikeView.aspx?pid=40&viewitemid=IntegratedOpenInterestTool
```cmd
copy .env.example .env
notepad .env
```
**Silver:**
```
PRODUCT_URL=https://cmegroup.quikstrike.net/User/QuikStrikeView.aspx?pid=41&viewitemid=IntegratedOpenInterestTool
Required settings:
```env
CME_USERNAME=your_cme_username
CME_PASSWORD=your_cme_password
```
**SOFR (3M SOFR):**
```
PRODUCT_URL=https://cmegroup.quikstrike.net/User/QuikStrikeView.aspx?pid=476&viewitemid=IntegratedOpenInterestTool
Optional settings:
```env
# Number of top strikes to export (default: 3)
TOP_N_STRIKES=3
# Run browser without window (default: false)
HEADLESS=false
# Page timeout in seconds (default: 30)
TIMEOUT_SECONDS=30
# Output CSV path
CSV_OUTPUT_PATH=./oi_data.csv
# Logging level: DEBUG, INFO, WARNING, ERROR
LOG_LEVEL=INFO
```
**Note:** You must be logged in to access QuikStrike data. The scraper will automatically login using credentials from `.env`.
## Output Format
The scraper exports to `oi_data.csv`:
```csv
Type,Strike,OI
CALL,4375.0,147
CALL,4450.0,173
CALL,4500.0,176
PUT,4435.0,49
PUT,4400.0,102
PUT,4515.0,150
[Price]
FuturePrice,4467.8
```
The `[Price]` section contains the current gold futures price scraped from investing.com.
## Session Persistence
The scraper saves login sessions to `cookies.json`:
- **First run:** Logs in with credentials, saves cookies
- **Subsequent runs:** Uses saved cookies if session is valid
- **Session expired:** Automatically re-logs in and saves new cookies
This makes scheduled runs faster and reduces login attempts to CME servers.
To force a fresh login:
```cmd
del cookies.json
```
## Integration with EA
The EA reads OI data from CSV when `InpOISource = OI_SOURCE_CSV_FILE`.
The EA reads OI data from CSV when configured:
```mql5
input ENUM_OI_SOURCE InpOISource = OI_SOURCE_CSV_FILE;
```
Place the generated `oi_data.csv` in MetaTrader's `MQL5/Files` directory.
Copy `oi_data.csv` to your MT5 `MQL5/Files` directory:
```
C:\Users\YourUsername\AppData\Roaming\MetaQuotes\Terminal\Common\MQL5\Files\oi_data.csv
```
## Scheduling
## Automatic Daily Scheduling
Use cron or Windows Task Scheduler to run periodically:
### Windows Task Scheduler
1. **Create scheduled task:**
- Open Task Scheduler (`taskschd.msc`)
- Click "Create Task"
2. **Configure General tab:**
- Name: `CME OI Scraper - Daily`
- ✅ Run whether user is logged on or not
- ✅ Run with highest privileges
3. **Configure Triggers tab:**
- New → On a schedule → Daily
- Start time: 9:00 AM (or your preferred time)
- ✅ Enabled
4. **Configure Actions tab:**
- Action: Start a program
- Program/script:
```
C:\Path\To\oi_scraper\run_scheduled.bat
```
- Start in:
```
C:\Path\To\oi_scraper
```
5. **Click OK to save**
### Linux/macOS (cron)
```bash
# Run every hour
0 * * * * cd /path/to/oi_scraper && python main.py
# Edit crontab
crontab -e
# Add line to run every day at 9 AM
0 9 * * * cd /path/to/oi_scraper && /path/to/venv/bin/python main.py
```
## Batch Files Reference
| File | Purpose |
|------|---------|
| `setup_env.bat` | One-time setup (creates virtual environment) |
| `run_with_venv.bat` | Manual run with visible window |
| `run_scheduled.bat` | For Task Scheduler (no window, no pause) |
## Troubleshooting
**Login fails:**
### Module Not Found Errors
**Error:** `ModuleNotFoundError: No module named 'playwright'`
**Solution:**
```cmd
run_with_venv.bat
```
The virtual environment ensures all dependencies are isolated.
### Login Fails
- Verify credentials in `.env`
- Check if CME requires 2FA
- Set `HEADLESS=false` to see what's happening
- Check screenshots: `login_failed.png`, `login_error.png`, `login_success.png`
- Check if CME requires 2FA (manual intervention needed)
- Set `HEADLESS=false` to see browser activity
- Check screenshots: `login_failed.png`, `login_error.png`
**No data extracted:**
- Check if table structure changed
- Increase `TIMEOUT_SECONDS`
- Check logs for detailed errors
- Screenshot saved as `login_debug.png` or `login_failed.png`
### No Data Extracted
**Login page selectors changed:**
- If the scraper can't find username/password inputs, CME may have updated their login page
- Update the selectors in `login_to_cme()` function in `main.py`:
```python
# Example: update to match current CME login form
page.fill('input[id="username"]', CME_USERNAME)
page.fill('input[id="password"]', CME_PASSWORD)
page.click('button[type="submit"]')
```
- Check if CME table structure changed
- Increase `TIMEOUT_SECONDS=60` in `.env`
- Check logs for errors
- Screenshot saved as `login_debug.png`
**Browser issues:**
- Install Chromium dependencies: `playwright install chromium`
- Try different browser: Change `p.chromium.launch()` to `p.firefox.launch()`
### Browser Issues
```cmd
# Reinstall Chromium
python -m playwright install chromium
```
### Session Expires Frequently
Delete cookies to force fresh login:
```cmd
del cookies.json
```
### Check Python Path Issues (Windows)
```cmd
# Check which Python is being used
where python
# Use Python launcher
py -3 main.py
# Or use the virtual environment
run_with_venv.bat
```
## Finding Product IDs
To scrape other instruments (Silver, Crude Oil, etc.):
1. Visit CME QuikStrike OI Heatmap
2. Login to your CME account
3. Select a product from the dropdown
4. The URL updates with the `pid` parameter
5. Note: This scraper is configured for Gold by default
## Notes
- The scraper targets the OI Heatmap table structure
- Only exports top N strikes by OI volume
- Login session is not persisted (login each run)
- Cookies could be saved for faster subsequent runs
- Targets the OI Heatmap table structure
- Exports top N unique strikes by OI volume
- Uses session cookies for faster subsequent runs
- CME sessions typically last several days to weeks
- Virtual environment recommended to avoid Python path conflicts
### Finding Product IDs
## Files
To find product IDs for other instruments:
1. Visit https://www.cmegroup.com/tools-information/quikstrike/open-interest-heatmap.html
2. Login to your CME account
3. Select a product from the "Products" menu
4. The URL will update with the `pid` parameter
5. Copy that URL to your `.env` file
Example: `https://www.cmegroup.com/tools-information/quikstrike/open-interest-heatmap.html?pid=40` (Gold)
```
oi_scraper/
├── main.py # Main scraper script
├── requirements.txt # Python dependencies
├── .env.example # Environment template
├── .env # Your credentials (create from example)
├── setup_env.bat # Windows: Create virtual environment
├── run_with_venv.bat # Windows: Manual run
├── run_scheduled.bat # Windows: Task Scheduler run
├── oi_data.csv # Output file (generated)
├── cookies.json # Session cookies (generated)
└── scraper.log # Log file (generated)
```