# 🚀 OffersExtractor Flask - Deployment & Integration Guide

## ✅ What's Been Created

You now have a **production-ready standalone Flask API** for PDF offer extraction at:
- **Directory**: `C:\xampp\htdocs\Stagi-app\OffersExtractorFlask\`
- **Local URL**: `http://localhost:5100`
- **Production URL**: `https://extractor.stagi-edu.com` (after deployment)

---

## 📦 What's Included

```
OffersExtractorFlask/
├── app.py                      # Main Flask application
├── wsgi.py                     # WSGI entry point for production
├── requirements.txt            # Python dependencies
├── .env                        # Environment configuration (with your API key)
├── .env.example                # Example environment file
├── .gitignore                  # Git ignore rules
├── README.md                   # Full documentation
├── QUICKSTART.md               # Quick start guide
├── deploy.sh                   # Automated deployment script (Linux)
├── supervisor.conf             # Supervisor configuration
├── offers-extractor.service    # Systemd service file
├── nginx.conf                  # Nginx reverse proxy config
└── test_service.py             # Testing script
```

---

## 🎯 Current Status

✅ **Flask app created and tested locally**
✅ **Dependencies installed**
✅ **DeepSeek API key configured**
✅ **Health endpoint working**: `http://localhost:5100/health`
✅ **Ready for production deployment**

---

## 🔌 Laravel Integration (Already Done)

Your Laravel app **already has the proxy setup** in `ExtractorController.php`. Just update the environment variable:

### Update Laravel `.env`:

```env
# Change this line:
EXTRACTOR_URL=http://127.0.0.1:5100

# After production deployment, change to:
EXTRACTOR_URL=https://extractor.stagi-edu.com
```

### How It Works:

1. **Frontend** calls: `POST /api/v1/extractor/extract`
2. **Laravel** proxies to: `POST {EXTRACTOR_URL}/extract_offers`
3. **Flask** processes PDF and returns offers
4. **Laravel** transforms response for frontend

**No frontend code changes needed!** The proxy is already implemented.

---

## 🏗️ Production Deployment Steps

### Step 1: Prepare Your Server

Requirements:
- Ubuntu/Debian Linux server
- Python 3.9+
- Nginx
- Supervisor or Systemd
- Domain: `extractor.stagi-edu.com` pointing to server IP

### Step 2: Upload Files

```bash
# From your local machine
scp -r C:\xampp\htdocs\Stagi-app\OffersExtractorFlask user@your-server:/tmp/

# On server
sudo mv /tmp/OffersExtractorFlask /var/www/offers-extractor
sudo chown -R www-data:www-data /var/www/offers-extractor
```

### Step 3: Run Automated Deployment

```bash
ssh user@your-server
cd /var/www/offers-extractor
sudo chmod +x deploy.sh
sudo ./deploy.sh
```

The script will:
- ✅ Install system dependencies
- ✅ Create virtual environment
- ✅ Install Python packages
- ✅ Setup Supervisor
- ✅ Configure Nginx
- ✅ Optionally setup SSL

### Step 4: Verify Deployment

```bash
# Check service status
sudo supervisorctl status offers-extractor

# Test health endpoint
curl https://extractor.stagi-edu.com/health
```

Expected response:
```json
{
  "status": "healthy",
  "service": "OffersExtractor",
  "deepseek_configured": true
}
```

### Step 5: Update Laravel

```bash
# On your Laravel server
cd /path/to/stagi-app
nano .env
```

Change:
```env
EXTRACTOR_URL=https://extractor.stagi-edu.com
```

Restart Laravel:
```bash
php artisan config:cache
php artisan cache:clear
```

---

## 🧪 Testing

### Test from Windows (Development)

```powershell
# Test health
Invoke-WebRequest -Uri "http://localhost:5100/health"

# Test extraction (replace path with actual PDF)
$form = @{ pdf = Get-Item "C:\path\to\test.pdf" }
Invoke-RestMethod -Uri "http://localhost:5100/extract_offers" -Method Post -Form $form
```

### Test from Linux (Production)

```bash
# Test health
curl https://extractor.stagi-edu.com/health

# Test extraction
curl -X POST https://extractor.stagi-edu.com/extract_offers \
  -F "pdf=@/path/to/test.pdf" \
  | jq .
```

### Test via Laravel Proxy

```bash
# This is what your frontend actually calls
curl -X POST https://your-laravel-api.com/api/v1/extractor/extract \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "pdf=@/path/to/test.pdf"
```

---

## 📊 Monitoring

### View Logs

```bash
# Application logs
tail -f /var/log/offers-extractor/error.log
tail -f /var/log/offers-extractor/access.log

# Nginx logs
tail -f /var/log/nginx/extractor.stagi-edu.com.access.log
```

### Check Service Health

```bash
# Supervisor
sudo supervisorctl status offers-extractor

# Or Systemd
sudo systemctl status offers-extractor
```

### Restart Service

```bash
# Supervisor
sudo supervisorctl restart offers-extractor

# Or Systemd
sudo systemctl restart offers-extractor
```

---

## 🔒 Security Features

✅ **HTTPS**: SSL/TLS encryption via Let's Encrypt
✅ **File Validation**: Only PDF files accepted
✅ **Size Limits**: 15MB max file size
✅ **Security Headers**: X-Frame-Options, CSP, etc.
✅ **API Key**: DeepSeek API key in .env (not in code)
✅ **Temp File Cleanup**: Automatic deletion after processing
✅ **CORS**: Properly configured (handled by Nginx in production)

---

## 🚨 Troubleshooting

### Service Won't Start

```bash
# Check logs
sudo supervisorctl tail offers-extractor stderr

# Test manually
cd /var/www/offers-extractor
source venv/bin/activate
python app.py
```

### DeepSeek API Errors

```bash
# Verify API key is set
cat /var/www/offers-extractor/.env | grep DEEPSEEK

# Test API key
curl https://api.deepseek.com/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-chat","messages":[{"role":"user","content":"test"}]}'
```

### 502 Bad Gateway

```bash
# Check if service is running
sudo supervisorctl status offers-extractor

# Check Nginx config
sudo nginx -t

# Check if port 5100 is listening
sudo netstat -tlnp | grep 5100
```

### 413 Request Entity Too Large

```bash
# Update Nginx
sudo nano /etc/nginx/nginx.conf
# Add: client_max_body_size 15M;

sudo nginx -t
sudo systemctl reload nginx
```

---

## 🔄 Updating the Service

When you make code changes:

```bash
# 1. Upload new files
scp -r OffersExtractorFlask/app.py user@server:/var/www/offers-extractor/

# 2. On server, restart service
sudo supervisorctl restart offers-extractor

# 3. If dependencies changed
cd /var/www/offers-extractor
source venv/bin/activate
pip install -r requirements.txt
sudo supervisorctl restart offers-extractor
```

---

## 📈 Performance Notes

- **Workers**: 4 Gunicorn workers (adjust based on CPU cores)
- **Timeout**: 300 seconds (for long PDF processing)
- **Rate Limiting**: 1 second between DeepSeek API calls
- **Deduplication**: Automatic offer deduplication
- **Cleanup**: Temp files auto-deleted after processing

---

## 🎯 API Reference

### GET /health

Returns service health status.

**Response:**
```json
{
  "status": "healthy",
  "service": "OffersExtractor",
  "deepseek_configured": true
}
```

### POST /extract_offers

Extracts offers from uploaded PDF.

**Request:**
- `Content-Type: multipart/form-data`
- Field: `pdf` (file, max 15MB)

**Response (Success):**
```json
{
  "offers": [
    {
      "title": "Stage Full Stack Developer",
      "description": "...",
      "skills": ["Angular", "Laravel"],
      "duration_months": 6,
      "tags": ["Web", "Full Stack"],
      "is_paid": true
    }
  ],
  "count": 1
}
```

**Response (Error):**
```json
{
  "error": "Error type",
  "message": "Detailed message"
}
```

---

## 📞 Next Steps

1. ✅ **Local Development**: Already working at `http://localhost:5100`
2. ⏳ **Deploy to Production Server**: Run `deploy.sh` on your server
3. ⏳ **Setup Domain**: Point `extractor.stagi-edu.com` to server IP
4. ⏳ **Setup SSL**: Run `certbot --nginx -d extractor.stagi-edu.com`
5. ⏳ **Update Laravel**: Change `EXTRACTOR_URL` in Laravel `.env`
6. ✅ **Test**: Frontend → Laravel → Flask → Response

---

## 📝 Important Notes

- **Frontend should NEVER call Flask directly** - Always go through Laravel proxy
- **DeepSeek API key** is in `.env` - Keep it secret!
- **Logs** are in `/var/log/offers-extractor/` (production)
- **Service runs on port 5100** (internal, Nginx proxies from 443)
- **Max file size: 15MB** (configurable in Nginx)
- **Timeout: 300 seconds** (5 minutes for large PDFs)

---

## ✅ Summary

You now have:
1. ✅ A standalone, production-ready Flask API
2. ✅ Working locally on port 5100
3. ✅ Complete deployment configuration (Nginx, Supervisor, SSL)
4. ✅ Laravel proxy already configured
5. ✅ No frontend changes needed
6. ✅ Automated deployment script
7. ✅ Comprehensive documentation

**Ready to deploy to production!** 🚀
