基于Flask轻量级Web框架+Python,使用多线程和Selenium爬虫技术来爬取招聘网站岗位信息。

目录

1.基本环境配置

2.内核配置

2.搭建三主两从的k8s集群

3.部署Keepalived和HAproxy高可用集群

4.搭建MySQL数据库,配置主从复制、读写分离

5.部署Flask应用至worker节点

6.搭建NFS共享存储,创建pv、pvc

7.安装内网穿透工具,配置端口转发

8.最终效果


1.基本环境配置

博主个人配置:准备5台具有2核cpu和4GB内存以上的服务器,系统为CentOS7.9.

如果配置不够,也可以只安装一台Master、两台Node。

  1.1 所有节点配置主机名、hosts

hostnamectl set-hostname k8s-master0

        立即刷新:

systemctl restart systemd-hostnamed

exec bash

        vim /etc/hosts: (记得修改ip)

192.168.163.151 k8s-master01
192.168.163.152 k8s-master02
192.168.163.153 k8s-master03
192.168.163.154 k8s-node01
192.168.163.155 k8s-node02

1.2 所有节点Docker、Kubernetes源和默认yum源

curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo   

yum install -y yum-utils device-mapper-persistent-data lvm2   

yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo   

cat <<EOF > /etc/yum.repos.d/kubernetes.repo  

[kubernetes]  

name=Kubernetes  

baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/  

enabled=1  

gpgcheck=1  

repo_gpgcheck=1  

gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg  

  https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg  

EOF


sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo   

1.3 所有节点安装一些常用的工具

yum install wget jq psmisc vim net-tools telnet git -y

1.4 所有节点关闭防火墙、SELinux、DNSmasq

systemctl disable --now firewalld  
systemctl disable --now dnsmasq  
systemctl disable --now NetworkManager  

setenforce 0  

sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinux  
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config

1.5 所有节点关闭Swap分区

swapoff -a && sysctl -w vm.swappiness=0  

sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab

 .1.6 所有节点安装ntpdate

rpm -ivh http://mirrors.wlnmp.com/centos/wlnmp-release-centos.noarch.rpm  

yum install ntpdate -y

 同步时间:

ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime  
echo 'Asia/Shanghai' >/etc/timezone  
ntpdate time2.aliyun.com  
crontab -e 
然后输入: */5 * * * * /usr/sbin/ntpdate time2.aliyun.com

1.7 所有节点配置limit

ulimit -SHn 65535  
vim /etc/security/limits.conf  
#末尾添加如下内容  
  soft nofile 65536  
  hard nofile 131072  
  soft nproc 65535  
  hard nproc 655350  
  soft memlock unlimited  
  hard memlock unlimited

1.8 Master01节点配置免密钥登录其他节点

ssh-keygen -t rsa

for i in k8s-master01 k8s-master02 k8s-master03 k8s-node01 k8s-node02; do ssh-copy-id -i .ssh/id_rsa.pub $i; done

1.9 Master01下载安装所有的源码文件

cd /root/ ; git clone https://gitee.com/dukuan/k8s-ha-install.git

1.10 所有节点升级系统并重启

yum update -y && reboot

2.内核配置

为了集群的稳定性和兼容性,生产环境的内核最好升级到4.18版本以上,本示例将升级到4.19版本。

2.1 Master01下载离线包

cd /root
wget http://193.49.22.109/elrepo/kernel/el7/x86_64/RPMS/kernel-ml-devel-4.19.12-1.el7.elrepo.x86_64.rpm
wget http://193.49.22.109/elrepo/kernel/el7/x86_64/RPMS/kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm

2.2 将安装包从Master01节点传到其他节点

for i in k8s-master02 k8s-master03 k8s-node01 k8s-node02; do scp kernel-ml-4.19.12-1.el7.elrepo.x86_64.rpm kernel-ml-devel-4.19.12-1.el7.elrepo.x86_64.rpm $i:/root/ ; done

2.3 所有节点安装内核

cd /root && yum localinstall -y kernel-ml*

2.4 所有节点更改内核启动顺序

grub2-set-default 0 && grub2-mkconfig -o /etc/grub2.cfg

grubby --args="user_namespace.enable=1"  --update-kernel="$(grubby --default-kernel)"

2.5 所有节点检查默认内核是不是4.19

grubby --default-kernel  

  /boot/vmlinuz-4.19.12-1.el7.elrepo.x86_64

2.6 所有节点重启,然后检查内核是不是4.19

reboot
uname -a

2.7 所有节点安装ipvsadm和ipset

yum install ipvsadm ipset sysstat conntrack libseccomp -y

2.8 所有节点配置ipvs模块,在内核4.19+版本nf_conntrack_ipv4已经改为nf_conntrack,4.18以下版本使用nf_conntrack_ipv4即可

vim /etc/modules-load.d/ipvs.conf  

#加入以下内容 

ip_vs  

ip_vs_lc  

ip_vs_wlc  

ip_vs_rr  

ip_vs_wrr  

ip_vs_lblc  

ip_vs_lblcr  

ip_vs_dh  

ip_vs_sh  

ip_vs_fo  

ip_vs_nq  

ip_vs_sed  

ip_vs_ftp  

ip_vs_sh  

nf_conntrack        # 4.18改为nf_conntrack_ipv4  

ip_tables  

ip_set  

xt_set  

ipt_set  

ipt_rpfilter  

ipt_REJECT

ipip


然后执行
systemctl enable --now systemd-modules-load.service

2.9 开启一些K8s集群中必需的内核参数,所有节点配置K8s内核

cat <<EOF > /etc/sysctl.d/k8s.conf  
net.ipv4.ip_forward = 1  
net.bridge.bridge-nf-call-iptables = 1  
net.bridge.bridge-nf-call-ip6tables = 1  
fs.may_detach_mounts = 1  
net.ipv4.conf.all.route_localnet = 1  
vm.overcommit_memory=1  
vm.panic_on_oom=0  
fs.inotify.max_user_watches=89100  
fs.file-max=52706963  
fs.nr_open=52706963  
net.netfilter.nf_conntrack_max=2310720  

net.ipv4.tcp_keepalive_time = 600  
net.ipv4.tcp_keepalive_probes = 3  
net.ipv4.tcp_keepalive_intvl =15  
net.ipv4.tcp_max_tw_buckets = 36000  
net.ipv4.tcp_tw_reuse = 1  
net.ipv4.tcp_max_orphans = 327680  
net.ipv4.tcp_orphan_retries = 3  
net.ipv4.tcp_syncookies = 1  
net.ipv4.tcp_max_syn_backlog = 16384  
net.ipv4.ip_conntrack_max = 65536  
net.ipv4.tcp_max_syn_backlog = 16384  
net.ipv4.tcp_timestamps = 0  
net.core.somaxconn = 16384
EOF

sysctl --system

2.10 所有节点配置完内核后,重启服务器,保证重启后内核依旧加载

reboot  

lsmod | grep --color=auto -e ip_vs -e nf_conntrack

2.搭建三主两从的k8s集群

本节主要安装的是集群中用到的各种组件,比如docker-ce、containerd、Kubernetes组件等。

两种Runtime(运行时):Docker和Containerd

        如果你的k8s安装的版本高于1.24(社区计划在1.24版本废弃对dockershim的支持,具体可以通过Kubernetes官方的ChangeLog进行确认)​,需要使用Containerd作为Kubernetes的Runtime。如果安装的版本低于1.24,选择Docker和Containerd均可。

 2.1 安装Runtime(这里我的Kubernetes版本为1.2

sysctl --system

7,选择containerd作为容器运行时。)

因为安装Docker时会自动安装Containerd,并且后面的制作镜像和云厂商镜像仓库也要使用到Docker,所以还是在每个节点安装Docker。

所有节点安装docker-ce-20.10:

yum install docker-ce-20.10.* docker-ce-cli-20.10.* -y

2.2 首先配置Containerd所需的模块(所有节点)​:

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf  

overlay  

br_netfilter  

EOF

2..3 所有节点加载模块:

modprobe -- overlay  

modprobe -- br_netfilter

2.4所有节点配置Containerd所需的内核:

cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf  

net.bridge.bridge-nf-call-iptables = 1  

net.ipv4.ip_forward = 1  

net.bridge.bridge-nf-call-ip6tables = 1  

EOF

2.5所有节点加载内核:

sysctl --system

2.6所有节点配置Containerd的配置文件:

mkdir -p /etc/containerd  

containerd config default | tee /etc/containerd/config.toml

2.7所有节点将Containerd的Cgroup改为Systemd:

vim /etc/containerd/config.toml

2.8找到containerd.runtimes.runc.options,添加SystemdCgroup = true,如图1.1所示。

SystemdCgroup = true

2.9所有节点将sandbox_image的Pause镜像改成符合自己版本的地址:registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6

registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6

2.10所有节点启动Containerd,并配置开机自启动:

systemctl daemon-reload  

systemctl enable --now containerd

2.11 所有节点配置crictl客户端连接的Runtime位置:

cat > /etc/crictl.yaml <<EOF  

runtime-endpoint: unix:///run/containerd/containerd.sock  

image-endpoint: unix:///run/containerd/containerd.sock  

timeout: 10  

debug: false  

EOF

2.12 安装Kubernetes的系统组件

查看最新的Kubernetes版本是多少

yum list kubeadm.x86_64 --showduplicates | sort -r

所有节点安装最新版本的kubeadm、kubelet和kubectl

yum install kubeadm-1.27* kubelet-1.27* kubectl-1.27* -y

所有节点设置Kubelet开机自启动

systemctl daemon-reload  

systemctl enable --now kubelet

 2.13 集群初始化

使用Kubeadm安装集群,需要一个Master节点初始化集群,然后加入其他节点即可。

初始化集群时,可以直接使用Kubeadm命令进行初始化,也可以使用一个配置文件进行初始化,由于使用命令行的形式可能需要配置的字段比较多,因此本示例采用配置文件进行初始化。

Master01节点创建kubeadm-config.yaml配置文件如下(也可以使用如下命令自动生成kubeadm config print init-defaults > kubeadm-config.yaml)

vim kubeadm-config.yaml


apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: 7t2weq.bjbawusm0jaxury
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.163.151
  bindPort: 6443
nodeRegistration:
  # criSocket: /var/run/dockershim.sock                   # 如果是Docker作为Runtime,配置此项
  criSocket: /run/containerd/containerd.sock       # 如果是Containerd作为Runtime,配置此项
  name: k8s-master01
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  kubeletExtraArgs:           #k8s组件1.27版本的这部分配置要写在yaml里,写在nodeRegistration里面。
    network-plugin: cni
    cni-bin-dir: /opt/cni/bin
    cni-conf-dir: /etc/cni/net.d
    container-runtime: remote
    container-runtime-endpoint: unix:///run/containerd/containerd.sock
    runtime-request-timeout: 15m
    cgroup-driver: systemd
---
apiServer:
  certSANs:
  - 192.168.163.150
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.163.150:16443
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.27.6    # 更改此处的版本号和kubeadm version命令查询的版本号一致
networking:
  dnsDomain: cluster.local
  podSubnet: 172.16.0.0/12
  serviceSubnet: 192.168.0.0/16
scheduler: {}

由于你的版本和此示例可能不太一致,因此需要更新一下kubeadm配置文件(Master01节点操作)​

kubeadm config migrate --old-config kubeadm-config.yaml --new-config new.yaml

 将new.yaml文件复制到其他Master节点

for i in k8s-master02 k8s-master03; do scp new.yaml $i:/root/; done

之后所有Master节点提前下载镜像,可以节省初始化时间(其他节点不需要更改任何配置,包括IP地址也不需要更改)

kubeadm config images pull --config /root/new.yaml

初始化Master01节点,初始化以后会在/etc/kubernetes目录下生成对应的证书和配置文件,之后其他Master节点加入Master01即可

kubeadm init --config /root/new.yaml --upload-certs

初始化成功以后,会产生Token值(每个人的都不一样,不要复制下面的),用于其他节点加入时使用,因此要记录一下,复制后,将其他Master和Node节点(也称为工作节点、Worker节点)加入集群

kubeadm join 192.168.163.150:16443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:60c196cdaacaed263e5d638...... --control-plane --certificate-key f1411a2765268b0759a3b7a8284f9ce31951cf341aad......

kubeadm join 192.168.163.150:16443 --token abcdef.0123456789abcdef  --discovery-token-ca-cert-hash sha256:60c196cdaacaed26e5d638216d59b6b4d1da......

所有节点初始化完成后,查看集群状态。节点的STATUS字段为NotReady,由于版本不同,显示的结果可能也不同,如果是NotReady,安装完CNI即可变成Ready状态

在Master01节点安装Calico

cd /root/k8s-ha-install && git checkout manual-installation-v1.27.x && cd calico/

修改Pod网段为自己配置的Pod网段

POD_SUBNET=cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep cluster-cidr= | awk -F= '{print $NF}'

替换calico.yaml

sed -i "s#POD_CIDR#${POD_SUBNET}#g" calico.yaml  

kubectl apply -f calico.yaml

创建完成后,查看容器和节点状态,均已Running;节点状态正常,均已Ready

3.部署Keepalived和HAproxy高可用集群

博主选择3台设备安装HAProxy和KeepAlived,你们可能根据自己情况选择。

 3.1 yum安装HAProxy和KeepAlived

yum install keepalived haproxy -y

 3.2 配置HAProxy

 mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak

mkdir /etc/haproxy

vim /etc/haproxy/haproxy.cfg



global

  maxconn 2000

  ulimit-n 16384

  log 127.0.0.1 local0 err

  stats timeout 30s

defaults

  log global

  mode http

  option httplog

  timeout connect 5000

  timeout client 50000

  timeout server 50000

  timeout http-request 15s

  timeout http-keep-alive 15s

frontend monitor-in

  bind *:33305

  mode http

  option httplog

  monitor-uri /monitor

frontend k8s-master

  bind 0.0.0.0:16443 # 监听的端口

  bind 127.0.0.1:16443

  mode tcp

  option tcplog

  tcp-request inspect-delay 5s

  default_backend k8s-master

backend k8s-master

  mode tcp

  option tcplog

  option tcp-check

  balance roundrobin

  default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

  server k8s-master01 192.168.163.11:6443 check # 配置后端服务器地址

  server k8s-master02 192.168.163.12:6443 check

  server k8s-master03 192.168.163.13:6443 check

配置KeepAlived

 mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak

mkdir /etc/keepalived

vim /etc/keepalived/keepalived.conf



! Configuration File for keepalived

global_defs {

  router_id LVS_DEVEL

  script_user root

  enable_script_security

}

vrrp_script chk_apiserver {

  script "/etc/keepalived/check_apiserver.sh"

  interval 5

  weight -5

  fall 2

  rise 1

}

vrrp_instance VI_1 {

  state MASTER

  interface ens33 # 本机网卡名称

  mcast_src_ip 192.168.163.11 # 本机IP地址

  virtual_router_id 51

  priority 101

  advert_int 2

  authentication {

    auth_type PASS

    auth_pass K8SHA_KA_AUTH

  }

  virtual_ipaddress {

    192.168.163.150 # VIP地址,需要是宿主机同网段且不存在的IP地址

  }

  track_script {

    chk_apiserver

  }

}

4.搭建MySQL数据库,配置主从复制、读写分离

4.1 安装mysql仓库

        

方法1:

  下载mysql官方仓库包

    wget https://repo.mysql.com/mysql80-community-release-el7-1.noarch.rpm

  安装mysql官方仓库

    rpm -ivh mysql80-community-release-el7-1.noarch.rpm   或   yum -y install mysql80-community-release-el7-3.noarch.rpm

方法2:

  添加 MySQL 官方 Yum 仓库:

    yum localinstall https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm

    

4.2 安装mysql server

yum install -y mysql-community-server

 4.3 登录并修改初始密码

grep 'A temporary password' /var/log/mysqld.log

4.4 配置主从复制

主数据库master

vim /etc/my.cnf

  server-id = 154    #配置server-id,让主服务器有唯一ID号(让从服务器知道他的主服务器是谁),建议使用ip最后3位

  log-bin = mysql-bin   #打开Mysql日志,日志格式为二进制。(主从复制依赖于二进制日志)

  #binlog-do-db = your_database_name     #如果只需要同步某个数据库

重启数据库:

systemctl restart mysqld 

创建一个用于复制的用户

create user 'copy_user'@'%' IDENTIFIED  with mysql_native_password  by 'xxxxx';
GRANT REPLICATION SLAVE ON . TO 'copy_user'@'%';
FLUSH PRIVILEGES;

在配置完成后,获取当前二进制日志文件名和位置,以便从服务器同步时使用

SHOW MASTER STATUS;    或者   SHOW MASTER STATUS\G;

从数据库slave

vim /etc/my.cnf


  server-id = 155                   #配置server-id,让主服务器有唯一ID号(让从服务器知道他的主服务器是谁),建议使用ip最后3位
relay_log = mysql-relay     #打开Mysql中继日志
read_only = 1                       #设置只读权限
log_bin = mysql-bin            #开启从服务器二进制日志

#log_slave_updates = 1      #使得更新的数据写进二进制日志中



重启数据库:

systemctl restart mysqld 

配置从服务器连接主服务器:

CHANGE MASTER TO  

master_host = '192.168.42.28',                  #主库的IP地址  

master_user = 'copy',                                  #在主库上创建的复制账号  

master_password = 'Nebula@123',          #在主库上创建的复制账号密码  

master_log_file = 'mysql-bin.000001',      #开始复制的二进制文件名(从主库  

查询结果中获取)  

master_log_pos = 817;                                 #开始复制的二进制文件位置(从主  

库查询结果中获取)

启动复制线程

start slave;

 检查从服务器的复制状态

SHOW SLAVE STATUS\G

5.部署Flask应用至worker节点

pachong.py

#----------------------------------------------------------------------------------------------------------------------#
import os
from selenium import webdriver
#from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
import csv
import concurrent.futures
import threading
import sys
#import requests
from selenium.webdriver.remote.webdriver import WebDriver  # 正确的导入
#----------------------------------------------------------------------------------------------------------------------#
def process_job(job,writer):
    try:
        company_name = job.find_element(By.CSS_SELECTOR, ".company-name a").text   # 获取公司名称
        job_name = job.find_element(By.CSS_SELECTOR, ".job-name").text             # 获取岗位名称
        job_area = job.find_element(By.CSS_SELECTOR,".job-area").text              # 岗位地址

        jingyan = (job.find_elements(By.CSS_SELECTOR,".tag-list li"))[0].text         #经验
        xueli = (job.find_elements(By.CSS_SELECTOR,".tag-list li"))[1].text           #学历
        guimo = (job.find_elements(By.CSS_SELECTOR,".company-tag-list li"))[2].text   #规模

        salary = job.find_element(By.CSS_SELECTOR, ".salary").text                 # 薪水
        info_dsec = job.find_element(By.CSS_SELECTOR, ".info-desc").text           # 福利待遇

        writer.writerow([company_name, job_name, job_area, guimo, jingyan, xueli, salary, info_dsec]) # 将岗位信息写入文件

    except Exception as e:
        print(f"处理岗位失败: {e}")
#----------------------------------------------------------------------------------------------------------------------#
def pachong(gwname,city,experience,companyPeopleNumber,xueli):
    # 设置Edge浏览器选项
    chrome_options = Options()
    chrome_options.add_argument("--headless")  # 无头模式,不显示浏览器界面
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--no-sandbox")
    # 新增:
    chrome_options.add_argument("--disable-extensions")
    chrome_options.add_argument("--incognito")
    # chrome_options.add_argument("start-maximized")  # 最大化浏览器窗口
    chrome_options.add_argument("--disable-blink-features=AutomationControlled")  # 禁用 WebDriver 控制的特征
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")  # 模拟正常浏览器
    # 第二次新增
    chrome_options.add_argument("--disable-javascript")  # 禁用JavaScript
    chrome_options.add_argument("--disable-cookies")  # 禁用Cookies
    #Selenium 在执行 driver.get(url) 时,Chrome 浏览器的标签页崩溃了,导致 WebDriver 会话被删除。
    #尝试增加 Chrome 启动时的内存限制或禁用一些不必要的功能。
    # 你可以尝试传递更多的 Chrome 配置选项来优化资源使用,尤其是 --disable-dev-shm-usage,这个选项可以防止 Chrome 在 Docker 容器中因共享内存不足而崩溃。
    chrome_options.add_argument("--disable-dev-shm-usage")  # 禁用/dev/shm使用,避免内存不足导致崩溃
# ----------------------------------------------------------------------------------------------------------------------#
    # Selenium Grid 的 URL,指向我们在 Kubernetes 中创建的 Service
    # selenium_url = "http://selenium-service:4444/wd/hub"
    # 从环境变量获取 Selenium 服务地址
    selenium_url = os.getenv("SELENIUM_URL", "http://selenium-service:4444/wd/hub")
    #/wd/hub 是一个常见的路径,它是用于与 Selenium Hub 进行通信的默认 REST API 路径。# 它的作用是让客户端(比如你的爬虫容器)能够通过 HTTP 协议与 Selenium Hub 进行交互,执行浏览器自动化操作。

    # 设置驱动路径(需要下载相应的driver) # 请替换为实际的driver路径
    # driver_path = "/app/chromedriver"
    # service = Service(driver_path)  # 使用 Service 来指定 chromedriver 路径

    # 创建浏览器对象
    # driver = webdriver.Chrome(service=service, options=chrome_options)
    driver = webdriver.Remote(    # 启动 Remote WebDriver
        command_executor=selenium_url,
        options=chrome_options
    )
#----------------------------------------------------------------------------------------------------------------------#
    #岗位gangwei.html 传进 url的 value值
    # 调试:检查是否有字段为空
# ----------------------------------------------------------------------------------------------------------------------#
    # 城市
    if not city:
        cs = ""
        city_pachong = city_chinese = ""
    else:
        cs = ",城市_"
        city_pachong, city_chinese = city.split(" ")[0], city.split(" ")[1]
# ----------------------------------------------------------------------------------------------------------------------#
    # 工作经验
    if not experience:
        jy = ""
        experience_pachong = experience_chinese = ""
    else:
        jy = ",经验_"
        experience_pachong, experience_chinese = experience.split(" ")[0], experience.split(" ")[1]
# ----------------------------------------------------------------------------------------------------------------------#
    # 公司规模
    if not companyPeopleNumber:
        gm = ""
        companyPeopleNumber_pachong = companyPeopleNumber_chinese = ""
    else:
        gm = ",规模_"
        companyPeopleNumber_pachong, companyPeopleNumber_chinese = companyPeopleNumber.split(" ")[0], companyPeopleNumber.split(" ")[1]
# ----------------------------------------------------------------------------------------------------------------------#
    # 学历
    if not xueli:
        xl = ""
        xueli_pachong = xueli_chinese = ""
    else:
        xl = ",学历_"
        xueli_pachong, xueli_chinese = xueli.split(" ")[0], xueli.split(" ")[1]
# ----------------------------------------------------------------------------------------------------------------------#
    page = 1  # 页数,共10页
#----------------------------------------------------------------------------------------------------------------------#
    url = f'https://www.zhipin.com/web/geek/job?query={gwname}&city={city_pachong}&experience={experience_pachong}' \
          f'&degree={xueli_pachong}&scale={companyPeopleNumber_pachong}&page={page}' # 目标页面 URL信息。
    # 共享文件夹挂载到容器中的 /app/pachong_filepath 目录
    folder_path = "/app/pachong_download_file"  # 使用容器内挂载的共享路径
    # 如果文件夹不存在,创建文件夹
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
    # 生成文件名
    file_name = f"岗位_{gwname}{cs}{city_chinese}{gm}{companyPeopleNumber_chinese}{jy}{experience_chinese}{xl}{xueli_chinese}.csv"
    # 生成文件的完整路径
    full_file_path = os.path.join(folder_path,file_name)
#----------------------------------------------------------------------------------------------------------------------#
    # 打开页面
    driver.get(url)
    #等待页面内容加载(使用 WebDriverWait 等待元素加载)
    ele = WebDriverWait(driver, 60).until(           # 等待页面中岗位信息加载出来
        EC.presence_of_element_located((By.CLASS_NAME, "job-card-left"))
    )
    # tishi = None
    # if ele  == None:
    #     tishi = "加载页面超时或出错。未找到满足要求的岗位,请尝试更换查询条件!"
    # 找到所有岗位的容器   #print("job_list",job_list)
    job_list = driver.find_elements(By.CSS_SELECTOR, ".job-list-box .job-card-wrapper")
#----------------------------------------------------------------------------------------------------------------------#
    # 打开文件准备写入数据
    with open(full_file_path, 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        # 写入文件的标题
        writer.writerow(["公司名称", "岗位名称", "地点信息" , "公司规模" , "工作经验" , "学历要求" ,"薪水" , "福利待遇"])
        # 使用 ThreadPoolExecutor 来并行处理多个岗位
        with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
            # 传递 job_list 和 writer 到 process_job 函数
            executor.map(lambda job: process_job(job, writer), job_list)
#----------------------------------------------------------------------------------------------------------------------#
    # 关闭浏览器
    driver.quit()
    #返回文件路径、文件名: (app.py中岗位页面调用的pachong()方法)
    return full_file_path,file_name
#----------------------------------------------------------------------------------------------------------------------#


app.py

import pachong02
from flask import Flask, render_template, request, redirect, url_for, session, flash
from flask_sqlalchemy import SQLAlchemy
from flask_socketio import SocketIO, send
import hashlib
import subprocess
import os
import time
from flask import send_from_directory
import sys
import csv
from flask import render_template, send_from_directory
from flask import send_file
from pachong02 import pachong
import socket
import pymysql
pymysql.install_as_MySQLdb()  # 使用 PyMySQL 替代 MySQLdb.(因为python3.9-slim中没有mysqldb,会报错)
from flask import jsonify

# 初始化 Flask 应用
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'mysql://root:001928_Llt@192.168.163.154/html01'  # 请根据你的数据库配置修改
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
app.secret_key = 'your_secret_key'  # 使用你生成的实际密钥+

# 初始化数据库和 SocketIO
db = SQLAlchemy(app)
socketio = SocketIO(app)

# 定义用户模型z
class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    username = db.Column(db.String(50), unique=True, nullable=False)
    password = db.Column(db.String(255), nullable=False)
    created_at = db.Column(db.DateTime, default=db.func.current_timestamp())

class History(db.Model):
    __tablename__ = 'history'  # 确保你已经设置表名

    id = db.Column(db.Integer, primary_key=True)
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'), nullable=False)
    gangwei_name = db.Column(db.String(255), nullable=True)
    city = db.Column(db.String(255), nullable=True)
    scale = db.Column(db.String(255), nullable=True)
    experience = db.Column(db.String(255), nullable=True)
    xueli = db.Column(db.String(255), nullable=True)
    used_time = db.Column(db.String(255), nullable=True)

    filename = db.Column(db.String(255), nullable=True)
    filepath = db.Column(db.String(255), nullable=True)
    created_at = db.Column(db.DateTime, default=db.func.current_timestamp())

    user = db.relationship('User', backref=db.backref('history', lazy=True))
    def __repr__(self):
        return f"<History {self.id}>"

# 在应用上下文中创建数据库表
with app.app_context():
    db.create_all()
# ___________________________________________________________________________________________________________________________

# 首页路由
@app.route('/')
def index():
    # pod_ip = socket.gethostbyname(socket.gethostname())
    # print(f"This request was handled by Pod with IP: {pod_ip}")
    return render_template('shouye.html')


# 用户注册
@app.route('/register', methods=['GET', 'POST'])
def register():
    # pod_ip = socket.gethostbyname(socket.gethostname())

    if request.method == 'POST':
        username = request.form['username']
        password = hashlib.sha256(request.form['password'].encode()).hexdigest()

        # 检查用户名是否已存在
        existing_user = User.query.filter_by(username=username).first()
        if existing_user:
            # 如果用户名已存在,返回一个错误提示
            flash('用户名已存在,请选择其他用户名', 'error')
            return redirect(url_for('register'))

        # 用户名不存在,创建新用户
        user = User(username=username, password=password)
        db.session.add(user)
        db.session.commit()

        # 将用户名保存在 session 中
        session['user_id'] = user.id
        session['username'] = user.username

        return redirect(url_for('gangwei'))

    return render_template('register.html')


# 用户登录
@app.route('/login', methods=['GET', 'POST'])
def login():
    # pod_ip = socket.gethostbyname(socket.gethostname())

    if request.method == 'POST':
        username = request.form['username']
        password = hashlib.sha256(request.form['password'].encode()).hexdigest()
        user = User.query.filter_by(username=username, password=password).first()
        if user:
            session['user_id'] = user.id
            session['username'] = user.username  # 将用户名保存在 session 中
            return redirect(url_for('gangwei'))
        else:
            return '用户名或密码错误!', 403

    return render_template('login.html')


# 用户退出
@app.route('/logout')
def logout():
    # pod_ip = socket.gethostbyname(socket.gethostname())

    session.pop('user_id', None)
    session.pop('username', None)  # 退出时清除用户名

    return redirect(url_for('index'))


# 岗位页面
@app.route('/gangwei', methods=['GET', 'POST'])
def gangwei():
    # pod_ip = socket.gethostbyname(socket.gethostname())

    if 'user_id' not in session:
        return redirect(url_for('login'))

    user = db.session.get(User, session['user_id'])

    no_jobs_found = False  # 默认假设找到岗位

    if request.method == 'POST':
        # 开始计时
        # start_time = round(time.time(),2)
        start_time = time.time()


        # 调试:检查是否有字段为空
        if not request.form['gangwei_name']:
            return "必填岗位字段,未填写完整,请检查并重试。", 400
        gangwei_name = request.form['gangwei_name']

        if not request.form['city']:
            city_mysql = ""
        else:city_mysql = request.form['city'].split(" ")[1]

        if not request.form['experience']:
            experience_mysql = ""
        else:experience_mysql = request.form['experience'].split(" ")[1]

        if not request.form['guimo']:
            guimo_mysql = ""
        else:guimo_mysql = request.form['guimo'].split(" ")[1]

        if not request.form['xueli']:
            xueli_mysql = ""
        else:xueli_mysql = request.form['xueli'].split(" ")[1]

        # 调用爬虫函数
        # 调用爬虫,并记录文件路径
        filepath, filename = pachong(gangwei_name, request.form['city'], request.form['experience']
                                     ,request.form['guimo'],request.form['xueli'])

        # 检查爬虫是否成功返回数据
        if not filepath:  # 这里可以通过一些方法检查是否返回了岗位数据文件
            no_jobs_found = True

        #结束计时
        end_time = time.time()
        used_time = format(end_time - start_time , ".2f")+'s'
        used_time02 = end_time - start_time

        # 以将查询信息保存到数据库,示例如下:
        history = History(user_id=session['user_id'], gangwei_name=gangwei_name, city=city_mysql, scale=guimo_mysql,
                          filepath=filepath, experience=experience_mysql, filename=filename, xueli=xueli_mysql,used_time=used_time)
        db.session.add(history)
        db.session.commit()
        # 查询最新保存的历史记录
        # history = History.query.filter_by(user_id=session['user_id']).order_by(History.created_at.desc()).first()

        # 渲染模板,传递历史记录和文件路径
        # return render_template('gangwei.html', username=user.username, history=history, no_jobs_found=no_jobs_found)
        return jsonify(success=True, time_used=used_time02, message="查询成功")
    # GET 请求时渲染模板
    return render_template('gangwei.html', username=user.username)


# 历史记录页面
@app.route('/jilu')
def jilu():
    # pod_ip = socket.gethostbyname(socket.gethostname())

    if 'user_id' not in session:
        return redirect(url_for('login'))
    # 获取用户的历史查询记录

    history = History.query.filter(History.user_id == session['user_id']).all()

    return render_template('jilu.html', history=history)


# 记录详情,可视化页面
@app.route('/view_file/<int:history_id>')
def view_file(history_id):
    # pod_ip = socket.gethostbyname(socket.gethostname())

    # 查询数据库,获取该历史记录的信息
    record = History.query.get_or_404(history_id)
    full_file_path = record.filepath
    # file_name = record.filename
    #
    # # 假设文件路径是相对于共享目录的路径,拼接出完整的文件路径
    # shared_directory = '/app/pachong_download_file'  # 容器内挂载的共享目录
    # full_file_path = os.path.join(shared_directory, file_name)

    # 确保文件存在
    if not os.path.exists(full_file_path):
        return "文件未找到", 404

    # 打开 CSV 文件并读取内容
    rows = []

    with open(full_file_path, newline='', encoding='utf-8') as file:
        csv_reader = csv.reader(file)
        for row in csv_reader:
            rows.append(row)

    # 在此可以处理查看文件的逻辑,比如展示文件内容或返回文件路径等
    return render_template('view_file.html', record=record, rows=rows)


@app.route('/download_file/<int:history_id>')
def download_file(history_id):
    # 查询数据库,获取该历史记录的信息
    record = History.query.get_or_404(history_id)
    full_file_path = record.filepath
    # #获取文件名
    # file_name = record.filename
    # # 假设文件路径是相对于共享目录的路径,拼接出完整的文件路径
    # shared_directory = '/app/pachong_download_file'  # 容器内挂载的共享目录
    # full_file_path = os.path.join(shared_directory, file_name)

    # 确保文件存在
    if not os.path.exists(full_file_path):
        return "文件未找到", 404

    return send_file(full_file_path, as_attachment=True)

# 启动flask应用
if __name__ == "__main__":  ##启动应用
    socketio.run(app, host="0.0.0.0", port=5002)

html:

        

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>岗位查询系统</title>

    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/css/bootstrap.min.css">
</head>
<body>
<div class="container mt-5">
    <h4>岗位查询系统</h4>
    <div class="card">
            <div class="card-body">
                <a href="{{ url_for('register') }}" >注册</a><br><br>
                <a href="{{ url_for('login') }}" >登录</a><br>
<!--                <br>This request was handled with IP: {{pod_ip}}-->
            </div>
    </div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/js/bootstrap.bundle.min.js"></script>

</body>
</html>


<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>注册</title>

    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/css/bootstrap.min.css">
    <style>
        .error {
            color: red;
            font-size: 0.9em;
            margin-left: 10px; /* 控制错误消息与输入框之间的距离 */
            vertical-align: middle; /* 确保错误信息垂直居中对齐 */
            display: inline; /* 确保错误信息与输入框在同一行 */
        }
    </style>
</head>
<body>
<div class="container mt-5">
    <h4>注册用户</h4>
    <div class="card">
        <div class="card-body">
            <form method="POST">
                <input type="text" name="username" placeholder="用户名" required>
                    <!-- 显示用户名已存在的提示信息 -->
                    {% if 'error' in get_flashed_messages() %}
                        <span class="error">用户名已存在</span>
                    {% endif %}
                    <!-- 显示用户名已存在的错误消息 -->
                    {% with messages = get_flashed_messages(with_categories=true) %}
                        {% if messages %}
                            {% for category, message in messages %}
                                {% if category == 'error' %}
                                    <p class="error">{{ message }}</p>
                                {% endif %}
                            {% endfor %}
                        {% endif %}
                    {% endwith %}
                <br><input type="password" name="password" placeholder="密码" required><br>
                <button type="submit" style="border: 1px solid black;">注册并登录</button>
            </form>
            <br><a href="{{ url_for('logout') }}">回到主界面</a>  <!-- 退出按钮 -->

<!--            <br>This request was handled with IP: {{pod_ip}}-->

        </div>
    </div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/js/bootstrap.bundle.min.js"></script>
</body>
</html>


<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>登录</title>

    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/css/bootstrap.min.css">
</head>
<body>
<div class="container mt-5">
    <h4>登录</h4>
    <div class="card">
        <div class="card-body">
            <form method="POST">
                <input type="text" name="username" placeholder="用户名" required><br>
                <input type="password" name="password" placeholder="密码" required><br>
                <button type="submit" style="border: 1px solid black;">登录</button>
            </form>
            <br><a href="{{ url_for('logout') }}">回到主界面</a>  <!-- 退出按钮 -->

<!--            <br>This request was handled with IP: {{pod_ip}}-->

        </div>
    </div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/js/bootstrap.bundle.min.js"></script>
</body>
</html>


<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>历史记录,可以查看预览,以及可以下载文件保存</title>

    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/css/bootstrap.min.css">

</head>
<body>
    <div class="container mt-5">
        <h4 style="display: inline;">历史记录</h4>
        <a style="float: right;" href="{{ url_for('gangwei') }}">返回查询界面</a>
        <div class="card">
            <div class="card-body">

                <a style="float: right;" href="{{ url_for('logout') }}">退出登录</a><br>

<!--            &lt;!&ndash;//这里的代码还没想好怎么写&ndash;&gt;-->
<!--            &lt;!&ndash;//这里展示历史查询的记录(每行包括:查询的岗位名、城市名、公司规模、查询时间、查看文件、下载保存),可以打开预览,以及可以下载文件保存&ndash;&gt;&lt;!&ndash; 如果有历史记录,则显示 &ndash;&gt;-->

                <!-- 展示历史查询的记录(每行包括:查询的岗位名、城市名、公司规模、查询时间、查看文件、下载保存)-->
                {% if history %}
                    <div class="table-responsive">  <!-- 添加响应式类,使表格在小屏幕下可滚动 -->
                        <table class="table table-bordered table-striped">
                            <thead>
                                <tr>
                                    <th>岗位名称</th>
                                    <th>城市</th>
                                    <th>公司规模</th>
                                    <th>工作经验</th>
                                    <th>学历</th>
                                    <th>查询时间</th>
                                    <th>用时</th>
                                    <th>查看文件</th>
                                    <th>下载文件</th>
                                </tr>
                            </thead>
                            <tbody>
                                {% for record in history %}
                                    <tr>
                                        <td>{{ record.gangwei_name }}</td>
                                        <td>{{ record.city }}</td>
                                        <td>{{ record.scale }}</td>
                                        <td>{{ record.experience }}</td>
                                        <td>{{ record.xueli }}</td>
                                        <td>{{ record.created_at }}</td>
                                        <td>{{ record.used_time }}</td>
                                        <td><a href="{{ url_for('view_file', history_id=record.id) }}">查看</a></td>
                                        <td><a href="{{ url_for('download_file', history_id=record.id) }}">下载</a></td>
                                    </tr>
                                {% endfor %}
                            </tbody>

<!--                            <br>This request was handled with IP: {{pod_ip}}-->

                        </table>
                    </div>
                {% else %}
                    <p>暂无历史记录。</p>
                {% endif %}
                <br>

            </div>
        </div>
    </div>

    <script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/js/bootstrap.bundle.min.js"></script>

</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>搜索岗位</title>

    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/css/bootstrap.min.css">
    <style>
        /* 遮罩层 */
        #overlay {
            display: none;
            position: fixed;
            top: 0;
            left: 0;
            width: 100%;
            height: 100%;
            background-color: rgba(0, 0, 0, 0.5);
            z-index: 1000;
        }

        /* 弹窗 */
        #popup {
            display: none;
            position: fixed;
            top: 50%;
            left: 50%;
            transform: translate(-50%, -50%);
            width: 300px;
            padding: 20px;
            background: white;
            box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
            border-radius: 8px;
            text-align: center;
            z-index: 1001;
        }

<!--       /* 关闭按钮 */-->
<!--       #popup button {-->
<!--            margin-top: 10px;-->
<!--            padding: 5px 10px;-->
<!--            background-color: #6C757D;/* 灰色 */-->
<!--            color: white;-->
<!--            border: none;-->
<!--            cursor: pointer;-->
<!--        }-->
<!--        /* 设置按钮悬停时的效果 */-->
<!--        #popup button:hover {-->
<!--            background-color: #343A40;/* 深灰色 */-->
<!--        }-->
    </style>
</head>
<body>
    <div class="container mt-5">
        <h4 style="display: inline;">欢迎来到岗位查询系统</h4>
        <a style="float: right;" href="{{ url_for('logout') }}">退出登录</a>
        <div class="card">
            <div class="card-body">
                <a style="margin-right: 130px;">你好:{{ username }}</a><br><br>
                <a>说明:根据个人需求,填写岗位信息。提交查询后等待数秒,待显示查询完毕后,即可到历史记录查看结果。</a><br><br>

                <form method="POST" id="query-form">
                    <a>岗位:</a>
                    <input size=10 type="text" name="gangwei_name" placeholder="请输入岗位" required>

                    <a>&nbsp;&nbsp;&nbsp;城市:</a>
                    <select name="city">
                        <option value="">请选择城市</option>
                        <option value="100010000 全国">全国</option>
                        <option value="101010100 北京">北京</option>
                        <option value="101020100 上海">上海</option>
                        <option value="101280100 广州">广州</option>
                        <option value="101280600 深圳">深圳</option>
                        <option value="101210100 杭州">杭州</option>
                        <option value="101030100 天津">天津</option>
                        <option value="101110100 西安">西安</option>
                        <option value="101190400 苏州">苏州</option>
                        <option value="101200100 武汉">武汉</option>
                        <option value="101230200 厦门">厦门</option>
                        <option value="101250100 长沙">长沙</option>
                        <option value="101270100 成都">成都</option>
                        <option value="101180100 郑州">郑州</option>
                        <option value="101040100 重庆">重庆</option>
                    </select>

                    <a>&nbsp;&nbsp;&nbsp;规模:</a>
                    <select name="guimo">
                        <option value="">请选择规模</option>
                        <option value="301 0-20人">0-20人</option>
                        <option value="302 20-99人">20-99人</option>
                        <option value="303 100-499人">100-499人</option>
                        <option value="304 500-999人">500-999人</option>
                        <option value="305 1000-9999人">1000-9999人</option>
                        <option value="306 10000人以上">10000人以上</option>
                    </select>

                    <a>&nbsp;&nbsp;&nbsp;经验:</a>
                    <select name="experience">
                        <option value="">请选择经验</option>
                        <option value="0 不限">不限</option>
                        <option value="108 在校生">在校生</option>
                        <option value="102 应届生">应届生</option>
                        <option value="101 经验不限">经验不限</option>
                        <option value="103 1年以内">1年以内</option>
                        <option value="104 1-3年">1-3年</option>
                        <option value="105 3-5年">3-5年</option>
                        <option value="106 5-10年">5-10年</option>
                        <option value="107 10年以上">10年以上</option>
                    </select>

                    <a>&nbsp;&nbsp;&nbsp;学历:</a>
                    <select name="xueli">
                        <option value="">请选择学历</option>
                        <option value="0 不限">不限</option>
                        <option value="209 初中及以下">初中及以下</option>
                        <option value="208 中专中技">中专中技</option>
                        <option value="206 高中">高中</option>
                        <option value="202 大专">大专</option>
                        <option value="203 本科">本科</option>
                        <option value="204 硕士">硕士</option>
                        <option value="205 博士">博士</option>
                    </select>

                    <a>&nbsp;&nbsp;</a>
                    <button  type="submit" style="border: 1px solid black; padding: 0px 5px; border-radius: 5px; ">提交查询</button>
                    <br><br>

                </form>
                <a>记录:</a><a href="{{ url_for('jilu') }}">历史记录</a>

<!--                实现弹出一个小提示框,显示查询中...-->
<!--                显示爬取完毕,用时xx秒钟-->

            </div>
        </div>
    </div>
<!--------------------------------------------------------------------------------------------------------------------->
<!--------------------------------------------------------------------------------------------------------------------->
    <!-- 遮罩层和弹窗 -->
    <div id="overlay"></div>
    <div id="popup">
        <p id="popup-message">查询中...</p>
        <button id="popup-close">关闭</button>
    </div>

    <script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.1.0/js/bootstrap.bundle.min.js"></script>

</body>
    <script>
        const form = document.getElementById('query-form');
        const overlay = document.getElementById('overlay');
        const popup = document.getElementById('popup');
        const popupMessage = document.getElementById('popup-message');
        const popupCloseButton = document.getElementById('popup-close');

        // 显示弹窗
        function showPopup(message) {
            overlay.style.display = 'block';
            popup.style.display = 'block';
            popupMessage.textContent = message;
        }

        // 关闭弹窗
        function hidePopup() {
            overlay.style.display = 'none';
            popup.style.display = 'none';
        }

        // 绑定关闭按钮的点击事件
        popupCloseButton.addEventListener('click', hidePopup);

        // 提交表单时的处理逻辑
        form.addEventListener('submit', function (event) {
            event.preventDefault(); // 阻止默认提交行为

            // 显示查询中弹窗
            showPopup('查询中...');

            // 获取表单数据
            const formData = new FormData(form);

            // 发送AJAX请求
            fetch('/gangwei', {
                method: 'POST',
                body: formData
            })
            .then(response => response.json())
            .then(data => {
                // 检查是否查询成功
                if (data.success) {
                    showPopup(`查询完毕!用时 ${data.time_used.toFixed(2)} 秒`);
                } else {
                    showPopup('查询失败,请检查条件!');
                }
                //2秒后关闭弹窗
                setTimeout(hidePopup, 2000);
            })
            .catch(error => {
                console.error('Error:', error);
                showPopup('查询失败,请稍后重试!');
                //2秒后关闭弹窗
                setTimeout(hidePopup, 2000);
            });
        });
    </script>

</html>







6.搭建NFS共享存储,创建pv、pvc

6.1 安装NFS 客户端工具

安装 NFS 服务器:
yum install nfs-utils

创建一个共享目录并设置权限:
mkdir -p /mnt/data/share
chmod 777 /mnt/data/share

编辑 /etc/exports 文件,配置允许 Kubernetes 节点挂载该目录:
echo "/mnt/data/share *(rw,sync,no_root_squash)" | sudo tee -a /etc/exports

启动并启用 NFS 服务:
systemctl start nfs-server
systemctl enable nfs-server

导出文件系统:
exportfs -a

确保防火墙允许 NFS 流量:
firewall-cmd --permanent --zone=public --add-service=nfs
firewall-cmd --reload

配置 Kubernetes 中的 PV 和 PVC

创建一个 PV 配置文件 nfs-pv.yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  capacity:
    storage: 10Gi  # 可以根据需要调整存储大小
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /mnt/data/share  # 在虚拟机上的共享路径
    server: node01        # 虚拟机的 IP 地址


创建一个 PVC 配置文件 nfs-pvc.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

应用 PV 和 PVC 配置文件:
kubectl apply -f nfs-pv.yaml
kubectl apply -f nfs-pvc.yaml

在容器中挂载 NFS 存储

apiVersion: v1
kind: Pod
metadata:
  name: pachong-pod
spec:
  containers:
    - name: pachong-container
      image: your-image
      volumeMounts:
        - mountPath: /app/pachong_download_file  # 容器内的挂载路径
          name: nfs-volume
  volumes:
    - name: nfs-volume
      persistentVolumeClaim:
        claimName: nfs-pvc  # PVC 名称

应用该配置:

kubectl apply -f pachong-pod.yaml

7.安装内网穿透工具,配置端口转发

   Domains – ngrok

安装:打开CMD,在ngrox目录下执行

choco install ngrok

ngrok config add-authtoken 2r43Wri1sdjfklsadjlksjdflkasdW_3vJvmsEfJcG6lsdkjf

点击左侧domains。如图,可复制地址,ngrok http –url=xxxxxx-xxxxxx-xxxxx.ngrok-free.app 80

即可将宿主机80端口转发到集群VIP端口上,实现项目的公网访问。

8.最终效果

作者:塑梦_

物联沃分享整理
物联沃-IOTWORD物联网 » 基于Flask轻量级Web框架+Python,使用多线程和Selenium爬虫技术来爬取招聘网站岗位信息。

发表回复