使用 Python 和 Selenium 轻松抓取 Twitter follower和following数据

抓取数据示例

{
  "userId": "95092020",
  "isBlueVerified": true,
  "following": false,
  "canDm": false,
  "canMediaTag": false,
  "createdAt": "Sun Dec 06 23:33:02 +0000 2009",
  "defaultProfile": false,
  "defaultProfileImage": false,
  "description": "Best-Selling Author | Clinical Psychologist | #1 Education Podcast | Enroll to @petersonacademy now:",
  "fastFollowersCount": 0,
  "favouritesCount": 161,
  "followersCount": 5613000,
  "friendCount": 1686,
  "hasCustomTimelines": true,
  "isTranslator": false,
  "listedCount": 14572,
  "location": "",
  "mediaCount": 7318,
  "name": "Dr Jordan B Peterson",
  "normalFollowersCount": 5613000,
  "pinnedTweetIdsStr": [
    "1849105729438790067"
  ],
  "possiblySensitive": false,
  "profileImageUrlHttps": "https://pbs.twimg.com/profile_images/1407056014776614923/TKBC60e1_normal.jpg",
  "profileInterstitialType": "",
  "username": "jordanbpeterson",
  "statusesCount": 51343,
  "translatorType": "none",
  "verified": false,
  "wantRetweets": false,
  "withheldInCountries": []
}

 无需设置即可直接运行代码

我们的指南提供了完整、随时可用的代码,可无缝抓取 Twitter 关注数据。使用 Python 和 Selenium,可自动收集数据并高效捕获性能日志。无需额外设置即可解锁 Twitter 洞察!

  1. 步骤 1:设置您的环境

    首先,安装 Selenium 以实现浏览器自动化

    1. pip install -r requirements.txt
  2. 第 2 步:下载 ChromeDriver

    1. 下载 ChromeDriver for Selenium 以与 Chrome 浏览器交互。点击此处获取 ChromeDriver Download
  3. 步骤 3:设置 Chrome 选项

    1. self.options = webdriver.ChromeOptions()
      user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
      self.options.add_argument(f'user-agent={user_agent}')
      self.options.add_argument('--disable-gpu')
      self.options.add_argument('--no-sandbox')
      self.options.add_argument('--disable-dev-shm-usage')
      self.options.add_argument(f"--remote-debugging-port={remote_debugging_port}")
      
      
      js_script_name = modify_random_canvas_js()
      self.browser = self.get_browser(script_files=[js_script_name], record_network_log=True, headless=True)
  4.  Step 4: Access the Target Page

    1. self.browser.switch_to.new_window('tab')
      url = 'https://x.com/1_usd_promotion/following'
      self.browser.get(url=url)
      
      time.sleep(2)
      
      exist_entry_id = []
      
      self.get_network(exist_entry_id, result_list)
      
      print(f'tweet result length = {len(result_list)}')

  5.  步骤 5:获取浏览器性能日志

    1. performance_log = self.browser.get_log("performance")
      for packet in performance_log:
      
          msg = packet.get("message")
          message = json.loads(packet.get("message")).get("message")
          packet_method = message.get("method")
      
          if "Network" in packet_method and 'Following' in msg:
      
              request_id = message.get("params").get("requestId")
      
              resp = self.browser.execute_cdp_cmd('Network.getResponseBody', {'requestId': request_id})

  6.  步骤 6:从响应中提取数据

    1. body = resp.get("body")
      body = json.loads(body)
      instructions = body['data']['user']['result']['timeline']['timeline'].get('instructions', None)
      if not instructions:
          continue
      for instruction in instructions:
          entries = instruction.get('entries', None)

  7. 步骤 8:重要注意事项 

                

  • Log in to Twitter and get your cookie. How to Get Twitter Cookie
  • Use APIs from Apify
  • Get the full code from GitHub
  • 作者:江先森

    物联沃分享整理
    物联沃-IOTWORD物联网 » 使用 Python 和 Selenium 轻松抓取 Twitter follower和following数据

    发表回复