网淘吧来吧,欢迎您!

Scrapling技能使用说明

2026-04-01 新闻来源:网淘吧 围观:9
电脑广告
手机广告

Scrapling - 自适应网络爬取

“为现代网络提供轻松的网络爬取体验。”


致谢

核心库

API逆向工程方法论


安装

# Core library (parser only)
pip install scrapling

# With fetchers (HTTP + browser automation) - RECOMMENDED
pip install "scrapling[fetchers]"
scrapling install

# With shell (CLI tools) - RECOMMENDED
pip install "scrapling[shell]"

# With AI (MCP server) - OPTIONAL
pip install "scrapling[ai]"

# Everything
pip install "scrapling[all]"

# Browser for stealth/dynamic mode
playwright install chromium

# For Cloudflare bypass (advanced)
pip install cloudscraper

代理指令

何时使用Scrapling

在以下情况使用Scrapling:

Scrapling

  • 从网站研究主题
  • 从博客、新闻网站、文档中提取数据
  • 使用Spider爬取多个页面
  • 为摘要收集内容
  • 从任何网站提取品牌数据
  • 从网站逆向工程API

请勿用于:

  • X/Twitter(请使用 x-tweet-fetcher 技能)
  • 需要登录的网站(除非提供了凭据)
  • 付费内容(请遵守robots.txt)
  • 服务条款禁止抓取的网站

快捷命令

1. 基本抓取(最常用)

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://example.com')

# Extract content
title = page.css('h1::text').get()
paragraphs = page.css('p::text').getall()

2. 隐秘抓取(防机器人/Cloudflare)

from scrapling.fetchers import StealthyFetcher

StealthyFetcher.adaptive = True
page = StealthyFetcher.fetch('https://example.com', headless=True, solve_cloudflare=True)

3. 动态抓取(完整浏览器自动化)

from scrapling.fetchers import DynamicFetcher

page = DynamicFetcher.fetch('https://example.com', headless=True, network_idle=True)

4. 自适应解析(适应设计变更)

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://example.com')

# First scrape - saves selectors
items = page.css('.product', auto_save=True)

# Later - if site changes, use adaptive=True to relocate
items = page.css('.product', adaptive=True)

5. Spider(多页面)

from scrapling.spiders import Spider, Response

class MySpider(Spider):
    name = "demo"
    start_urls = ["https://example.com"]
    concurrent_requests = 3
    
    async def parse(self, response: Response):
        for item in response.css('.item'):
            yield {"item": item.css('h2::text').get()}
        
        # Follow links
        next_page = response.css('.next a')
        if next_page:
            yield response.follow(next_page[0].attrib['href'])

MySpider().start()

6. CLI用法

# Simple fetch to file
scrapling extract get https://example.com content.html

# Stealthy fetch (bypass anti-bot)
scrapling extract stealthy-fetch https://example.com content.html

# Interactive shell
scrapling shell https://example.com

常见模式

提取文章内容

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://example.com/article')

# Try multiple selectors for title
title = (
    page.css('[itemprop="headline"]::text').get() or
    page.css('article h1::text').get() or
    page.css('h1::text').get()
)

# Get paragraphs
content = page.css('article p::text, .article-body p::text').getall()

print(f"Title: {title}")
print(f"Paragraphs: {len(content)}")

研究多个页面

from scrapling.spiders import Spider, Response

class ResearchSpider(Spider):
    name = "research"
    start_urls = ["https://news.ycombinator.com"]
    concurrent_requests = 5
    
    async def parse(self, response: Response):
        for item in response.css('.titleline a::text').getall()[:10]:
            yield {"title": item, "source": "HN"}
        
        more = response.css('.morelink::attr(href)').get()
        if more:
            yield response.follow(more)

ResearchSpider().start()

爬取整个网站(简易模式)

通过跟踪内部链接自动爬取域名下的所有页面:

from scrapling.spiders import Spider, Response
from urllib.parse import urljoin, urlparse

class EasyCrawl(Spider):
    """Auto-crawl all pages on a domain."""
    
    name = "easy_crawl"
    start_urls = ["https://example.com"]
    concurrent_requests = 3
    
    def __init__(self):
        super().__init__()
        self.visited = set()
    
    async def parse(self, response: Response):
        # Extract content
        yield {
            'url': response.url,
            'title': response.css('title::text').get(),
            'h1': response.css('h1::text').get(),
        }
        
        # Follow internal links (limit to 50 pages)
        if len(self.visited) >= 50:
            return
        
        self.visited.add(response.url)
        
        links = response.css('a::attr(href)').getall()[:20]
        for link in links:
            full_url = urljoin(response.url, link)
            if full_url not in self.visited:
                yield response.follow(full_url)

# Usage
result = EasyCrawl()
result.start()

站点地图爬取

sitemap.xml爬取页面(备选链接发现方案):

from scrapling.fetchers import Fetcher
from scrapling.spiders import Spider, Response
from urllib.parse import urljoin, urlparse
import re

def get_sitemap_urls(url: str, max_urls: int = 100) -> list:
    """Extract URLs from sitemap.xml - also checks robots.txt."""
    
    parsed = urlparse(url)
    base_url = f"{parsed.scheme}://{parsed.netloc}"
    
    sitemap_urls = [
        f"{base_url}/sitemap.xml",
        f"{base_url}/sitemap-index.xml",
        f"{base_url}/sitemap_index.xml",
        f"{base_url}/sitemap-news.xml",
    ]
    
    all_urls = []
    
    # First check robots.txt for sitemap URL
    try:
        robots = Fetcher.get(f"{base_url}/robots.txt")
        if robots.status == 200:
            sitemap_in_robots = re.findall(r'Sitemap:\s*(\S+)', robots.text, re.IGNORECASE)
            for sm in sitemap_in_robots:
                sitemap_urls.insert(0, sm)
    except:
        pass
    
    # Try each sitemap location
    for sitemap_url in sitemap_urls:
        try:
            page = Fetcher.get(sitemap_url, timeout=10)
            if page.status != 200:
                continue
            
            text = page.text
            
            # Check if it's XML
            if '<?xml' in text or '<urlset' in text or '<sitemapindex' in text:
                urls = re.findall(r'<loc>([^<]+)</loc>', text)
                all_urls.extend(urls[:max_urls])
                print(f"Found {len(urls)} URLs in {sitemap_url}")
        except:
            continue
    
    return list(set(all_urls))[:max_urls]

def crawl_from_sitemap(domain_url: str, max_pages: int = 50):
    """Crawl pages from sitemap."""
    
    print(f"Fetching sitemap for {domain_url}...")
    urls = get_sitemap_urls(domain_url)
    
    if not urls:
        print("No sitemap found. Use EasyCrawl instead!")
        return []
    
    print(f"Found {len(urls)} URLs, crawling first {max_pages}...")
    
    results = []
    for url in urls[:max_pages]:
        try:
            page = Fetcher.get(url, timeout=10)
            results.append({
                'url': url,
                'status': page.status,
                'title': page.css('title::text').get(),
            })
        except Exception as e:
            results.append({'url': url, 'error': str(e)[:50]})
    
    return results

# Usage
print("=== Sitemap Crawl ===")
results = crawl_from_sitemap('https://example.com', max_pages=10)
for r in results[:3]:
    print(f"  {r.get('title', r.get('error', 'N/A'))}")

# Alternative: Easy crawl all links
print("\n=== Easy Crawl (Link Discovery) ===")
result = EasyCrawl(start_urls=["https://example.com"], max_pages=10).start()
print(f"Crawled {len(result.items)} pages")

Firecrawl风格爬取(两全其美)

借鉴Firecrawl的运行机制——结合站点地图发现与链接跟踪:

from scrapling.fetchers import Fetcher
from scrapling.spiders import Spider, Response
from urllib.parse import urljoin, urlparse
import re

def firecrawl_crawl(url: str, max_pages: int = 50, use_sitemap: bool = True):
    """
    Firecrawl-style crawling:
    - use_sitemap=True: Discover URLs from sitemap first (default)
    - use_sitemap=False: Only follow HTML links (like sitemap:"skip")
    
    Matches Firecrawl's crawl behavior.
    """
    
    parsed = urlparse(url)
    domain = parsed.netloc
    
    # ========== Method 1: Sitemap Discovery ==========
    if use_sitemap:
        print(f"[Firecrawl] Discovering URLs from sitemap...")
        
        sitemap_urls = [
            f"{url.rstrip('/')}/sitemap.xml",
            f"{url.rstrip('/')}/sitemap-index.xml",
        ]
        
        all_urls = []
        
        # Try sitemaps
        for sm_url in sitemap_urls:
            try:
                page = Fetcher.get(sm_url, timeout=15)
                if page.status == 200:
                    # Handle bytes
                    text = page.body.decode('utf-8', errors='ignore') if isinstance(page.body, bytes) else str(page.body)
                    
                    if '<urlset' in text:
                        urls = re.findall(r'<loc>([^<]+)</loc>', text)
                        all_urls.extend(urls[:max_pages])
                        print(f"[Firecrawl] Found {len(urls)} URLs in {sm_url}")
            except:
                continue
        
        if all_urls:
            print(f"[Firecrawl] Total: {len(all_urls)} URLs from sitemap")
            
            # Crawl discovered URLs
            results = []
            for page_url in all_urls[:max_pages]:
                try:
                    page = Fetcher.get(page_url, timeout=15)
                    results.append({
                        'url': page_url,
                        'status': page.status,
                        'title': page.css('title::text').get() if page.status == 200 else None,
                    })
                except Exception as e:
                    results.append({'url': page_url, 'error': str(e)[:50]})
            
            return results
    
    # ========== Method 2: Link Discovery (sitemap: skip) ==========
    print(f"[Firecrawl] Sitemap skip - using link discovery...")
    
    class LinkCrawl(Spider):
        name = "firecrawl_link"
        start_urls = [url]
        concurrent_requests = 3
        
        def __init__(self):
            super().__init__()
            self.visited = set()
            self.domain = domain
            self.results = []
        
        async def parse(self, response: Response):
            if len(self.results) >= max_pages:
                return
            
            self.results.append({
                'url': response.url,
                'status': response.status,
                'title': response.css('title::text').get(),
            })
            
            # Follow internal links
            links = response.css('a::attr(href)').getall()[:20]
            for link in links:
                full_url = urljoin(response.url, link)
                parsed_link = urlparse(full_url)
                
                if parsed_link.netloc == self.domain and full_url not in self.visited:
                    self.visited.add(full_url)
                    if len(self.visited) < max_pages:
                        yield response.follow(full_url)
    
    result = LinkCrawl()
    result.start()
    return result.results

# Usage
print("=== Firecrawl-Style (sitemap: include) ===")
results = firecrawl_crawl('https://www.cloudflare.com', max_pages=5, use_sitemap=True)
print(f"Crawled: {len(results)} pages")

print("\n=== Firecrawl-Style (sitemap: skip) ===")
results = firecrawl_crawl('https://example.com', max_pages=5, use_sitemap=False)
print(f"Crawled: {len(results)} pages")

处理错误

from scrapling.fetchers import Fetcher, StealthyFetcher

try:
    page = Fetcher.get('https://example.com')
except Exception as e:
    # Try stealth mode
    page = StealthyFetcher.fetch('https://example.com', headless=True)
    
if page.status == 403:
    print("Blocked - try StealthyFetcher")
elif page.status == 200:
    print("Success!")

会话管理

from scrapling.fetchers import FetcherSession

with FetcherSession(impersonate='chrome') as session:
    page = session.get('https://quotes.toscrape.com/', stealthy_headers=True)
    quotes = page.css('.quote .text::text').getall()

爬虫中的多会话类型

from scrapling.spiders import Spider, Request, Response
from scrapling.fetchers import FetcherSession, AsyncStealthySession

class MultiSessionSpider(Spider):
    name = "multi"
    start_urls = ["https://example.com/"]
    
    def configure_sessions(self, manager):
        manager.add("fast", FetcherSession(impersonate="chrome"))
        manager.add("stealth", AsyncStealthySession(headless=True), lazy=True)
    
    async def parse(self, response: Response):
        for link in response.css('a::attr(href)').getall():
            if "protected" in link:
                yield Request(link, sid="stealth")
            else:
                yield Request(link, sid="fast", callback=self.parse)

高级解析与导航

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://quotes.toscrape.com/')

# Multiple selection methods
quotes = page.css('.quote')           # CSS
quotes = page.xpath('//div[@class="quote"]')  # XPath
quotes = page.find_all('div', class_='quote')  # BeautifulSoup-style

# Navigation
first_quote = page.css('.quote')[0]
author = first_quote.css('.author::text').get()
parent = first_quote.parent

# Find similar elements
similar = first_quote.find_similar()

进阶:API逆向工程

“网络爬虫80%的工作是逆向工程。”

本节涵盖从网站直接发现和复制API的高级技巧——这些技巧常能揭示隐藏在付费API背后的“隐藏”数据。

1. API端点发现

许多网站通过客户端请求加载数据。使用浏览器开发者工具来发现它们:

步骤:

  1. 打开浏览器开发者工具(按F12键)
  2. 进入网络标签页
  3. 刷新页面
  4. 查找XHRFetch请求
  5. 检查端点是否返回JSON数据

需关注的内容:

  • 指向/api/*端点的请求
  • 包含结构化数据(JSON)的响应
  • 免费和付费版块共用的相同端点

示例模式:

# Found in Network tab:
GET https://api.example.com/v1/users/transactions
Response: {"data": [...], "pagination": {...}}

2. JavaScript分析

授权令牌通常在客户端生成。可在.js文件中查找:

步骤:

  1. 在网络标签页中,查看发起者
  2. 点击.js发出请求的文件
  3. 搜索认证头部名称(例如,sol-autAuthorizationX-API-Key
  4. 寻找生成令牌的函数

常见模式:

  • 明文函数名:generateToken()createAuthHeader()
  • 混淆代码:直接搜索头部名称
  • 随机字符串生成:Math.random()crypto.getRandomValues()

3. 复制已发现的API

一旦你找到端点和认证模式:

import requests
import random
import string

def generate_auth_token():
    """Replicate discovered token generation logic."""
    chars = string.ascii_letters + string.digits
    token = ''.join(random.choice(chars) for _ in range(40))
    # Insert fixed string at random position
    fixed = "B9dls0fK"
    pos = random.randint(0, len(token))
    return token[:pos] + fixed + token[pos:]

def scrape_api_endpoint(url):
    """Hit discovered API endpoint with replicated auth."""
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept': 'application/json',
        'sol-aut': generate_auth_token(),  # Replicate discovered header
    }
    
    response = requests.get(url, headers=headers)
    return response.json()

4. Cloudscraper 绕过(Cloudflare)

对于受 Cloudflare 保护的端点,使用cloudscraper

pip install cloudscraper
import cloudscraper

def create_scraper():
    """Create a cloudscraper session that bypasses Cloudflare."""
    scraper = cloudscraper.create_scraper(
        browser={
            'browser': 'chrome',
            'platform': 'windows',
            'desktop': True
        }
    )
    return scraper

# Usage
scraper = create_scraper()
response = scraper.get('https://api.example.com/endpoint')
data = response.json()

5. 完整 API 复制模式

import cloudscraper
import random
import string
import json

class APIReplicator:
    """Replicate discovered API from website."""
    
    def __init__(self, base_url):
        self.base_url = base_url
        self.session = cloudscraper.create_scraper()
    
    def generate_token(self, pattern="random"):
        """Replicate discovered token generation."""
        if pattern == "solscan":
            # 40-char random + fixed string at random position
            chars = string.ascii_letters + string.digits
            token = ''.join(random.choice(chars) for _ in range(40))
            fixed = "B9dls0fK"
            pos = random.randint(0, len(token))
            return token[:pos] + fixed + token[pos:]
        else:
            # Generic random token
            return ''.join(random.choices(string.ascii_letters + string.digits, k=32))
    
    def get(self, endpoint, headers=None, auth_header=None, auth_pattern="random"):
        """Make API request with discovered auth."""
        url = f"{self.base_url}{endpoint}"
        
        # Build headers
        request_headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Accept': 'application/json',
        }
        
        # Add discovered auth header
        if auth_header:
            request_headers[auth_header] = self.generate_token(auth_pattern)
        
        # Merge custom headers
        if headers:
            request_headers.update(headers)
        
        response = self.session.get(url, headers=request_headers)
        return response

# Usage example
api = APIReplicator("https://api.solscan.io")
data = api.get(
    "/account/transactions",
    auth_header="sol-aut",
    auth_pattern="solscan"
)
print(data)

6. 发现清单

当处理一个新网站时:

步骤操作工具
1打开开发者工具的网络标签页F12
2重新加载页面,按 XHR/Fetch 过滤网络过滤器
3寻找 JSON 响应响应标签页
4检查是否使用相同端点获取“高级”数据比较请求
5在JS文件中查找认证头信息发起方列
6提取令牌生成逻辑JS调试器
7在Python中复现复现器类
8针对API进行测试运行脚本

品牌数据提取(Firecrawl替代方案)

从任意网站提取品牌数据、颜色、徽标和文案:

from scrapling.fetchers import Fetcher
from urllib.parse import urljoin
import re

def extract_brand_data(url: str) -> dict:
    """Extract structured brand data from any website - Firecrawl style."""
    
    # Try stealth mode first (handles anti-bot)
    try:
        page = Fetcher.get(url)
    except:
        from scrapling.fetchers import StealthyFetcher
        page = StealthyFetcher.fetch(url, headless=True)
    
    # Helper to get text from element
    def get_text(elements):
        return elements[0].text if elements else None
    
    # Helper to get attribute
    def get_attr(elements, attr_name):
        return elements[0].attrib.get(attr_name) if elements else None
    
    # Brand name (try multiple selectors)
    brand_name = (
        get_text(page.css('[property="og:site_name"]')) or
        get_text(page.css('h1')) or
        get_text(page.css('title'))
    )
    
    # Tagline
    tagline = (
        get_text(page.css('[property="og:description"]')) or
        get_text(page.css('.tagline')) or
        get_text(page.css('.hero-text')) or
        get_text(page.css('header h2'))
    )
    
    # Logo URL
    logo_url = (
        get_attr(page.css('[rel="icon"]'), 'href') or
        get_attr(page.css('[rel="apple-touch-icon"]'), 'href') or
        get_attr(page.css('.logo img'), 'src')
    )
    if logo_url and not logo_url.startswith('http'):
        logo_url = urljoin(url, logo_url)
    
    # Favicon
    favicon = get_attr(page.css('[rel="icon"]'), 'href')
    favicon_url = urljoin(url, favicon) if favicon else None
    
    # OG Image
    og_image = get_attr(page.css('[property="og:image"]'), 'content')
    og_image_url = urljoin(url, og_image) if og_image else None
    
    # Screenshot (using external service)
    screenshot_url = f"https://image.thum.io/get/width/1200/crop/800/{url}"
    
    # Description
    description = (
        get_text(page.css('[property="og:description"]')) or
        get_attr(page.css('[name="description"]'), 'content')
    )
    
    # CTA text
    cta_text = (
        get_text(page.css('a[href*="signup"]')) or
        get_text(page.css('.cta')) or
        get_text(page.css('[class*="button"]'))
    )
    
    # Social links
    social_links = {}
    for platform in ['twitter', 'facebook', 'instagram', 'linkedin', 'youtube', 'github']:
        link = get_attr(page.css(f'a[href*="{platform}"]'), 'href')
        if link:
            social_links[platform] = link
    
    # Features (from feature grid/cards)
    features = []
    feature_cards = page.css('[class*="feature"], .feature-card, .benefit-item')
    for card in feature_cards[:6]:
        feature_text = get_text(card.css('h3, h4, p'))
        if feature_text:
            features.append(feature_text.strip())
    
    return {
        'brandName': brand_name,
        'tagline': tagline,
        'description': description,
        'features': features,
        'logoUrl': logo_url,
        'faviconUrl': favicon_url,
        'ctaText': cta_text,
        'socialLinks': social_links,
        'screenshotUrl': screenshot_url,
        'ogImageUrl': og_image_url
    }

# Usage
brand_data = extract_brand_data('https://example.com')
print(brand_data)

品牌数据CLI

# Extract brand data using the Python function above
python3 -c "
import json
import sys
sys.path.insert(0, '/path/to/skill')
from brand_extraction import extract_brand_data
data = extract_brand_data('$URL')
print(json.dumps(data, indent=2))
"

功能对比

功能状态备注
基础抓取✅ 正常Fetcher.get()
隐蔽抓取✅ 正常StealthyFetcher.fetch()
动态抓取✅ 运行中DynamicFetcher.fetch()
自适应解析✅ 运行中自动保存 + 自适应
爬虫抓取✅ 运行中async def parse()
CSS选择器✅ 运行中.css()
XPath✅ 运行中.xpath()
会话管理✅ 运行中FetcherSession, StealthySession
代理轮换✅ 运行中ProxyRotator类
CLI工具✅ 运行中scrapling extract
品牌数据提取✅ 运行中extract_brand_data()
API 逆向工程✅ 正常APIReplicator 类
Cloudscraper 绕过✅ 正常cloudscraper 集成
简易站点抓取✅ 正常EasyCrawl 类
网站地图抓取✅ 正常get_sitemap_urls()
MCP 服务器❌ 已排除不需要

测试示例

IEEE Spectrum

page = Fetcher.get('https://spectrum.ieee.org/...')
title = page.css('h1::text').get()
content = page.css('article p::text').getall()

✅ 正常

Hacker News

page = Fetcher.get('https://news.ycombinator.com')
stories = page.css('.titleline a::text').getall()

✅ 正常

示例域名

page = Fetcher.get('https://example.com')
title = page.css('h1::text').get()

✅ 正常


🔧 快速故障排除

问题解决方案
403/429 被阻止使用 StealthyFetcher 或 cloudscraper
Cloudflare使用 StealthyFetcher 或 cloudscraper
需要 JavaScript使用 DynamicFetcher
网站已更改使用 adaptive=True
付费 API 暴露使用 API 逆向工程
验证码无法绕过 - 跳过或使用官方 API
需要认证请勿绕过 - 使用官方 API

技能图谱

相关技能:

  • [[content-research]] - 研究流程
  • [[blogwatcher]] - RSS/资讯源监控
  • [[youtube-watcher]] - 视频内容
  • [[chirp]] - Twitter/X 互动
  • [[newsletter-digest]] - 内容摘要
  • [[x-tweet-fetcher]] - X/Twitter (请替代 Scrapling 使用)

更新日志

v1.0.8 (2026-02-25)

  • 新增:Firecrawl风格爬取- 结合站点地图发现与链接跟踪
  • 新增:use_sitemap参数- 匹配Firecrawl的站点地图"包含"/"跳过"行为
  • 已验证:cloudflare.com从站点地图返回2,447个URL!

v1.0.7 (2026-02-25)

  • 修复:EasyCrawl爬虫语法- 更新以适配scrapling的实际爬虫API
  • 已验证:爬虫功能正常- 测试并爬取了example.com的20多页内容

v1.0.6 (2026-02-25)

  • 新增:简易站点爬取- 通过EasyCrawl爬虫自动爬取域名下所有页面
  • 新增:站点地图爬取- 从sitemap.xml提取URL并爬取
  • 在站点爬取功能上与Firecrawl实现特性对等

v1.0.5 (2026-02-25)

  • 增强:API逆向工程方法
    • 来自@paoloanzn工作的详细分步流程
    • 包含精确时间线的真实Solscan案例研究
    • 新增:分步方法学部分
    • 新增:真实示例文档(2025年3月 vs 2026年2月 Solscan)
    • 新增:包含10个步骤的发现清单
    • 已记录:如何在JS文件中查找认证头信息
    • 已记录:令牌生成模式提取
    • 已更新:集成多尝试模式的Cloudscraper
    • 已验证:Solscan现已修复(两个端点均启用Cloudflare)

v1.0.4 (2026-02-25)

  • 已修复:品牌数据提取API- 修正了scrapling的Response对象选择器
  • 已修复.html.text/.body
  • 已修复.title()page.css('title')
  • 已修复.logo img::src.logo img::attr(src)
  • 已测试并验证可用

v1.0.3 (2026-02-25)

  • 新增:API 逆向工程章节
    • API 端点发现(网络标签分析)
    • JavaScript 分析(查找认证逻辑)
    • 集成 Cloudscraper 以绕过 Cloudflare
    • 完整的 APIReplicator 类
    • 发现清单
  • 在安装中加入了 cloudscraper

v1.0.2 (2026-02-25)

  • 与上游 GitHub README 完全同步
  • 新增品牌数据提取章节
  • 简洁、仅包含核心的版本

v1.0.1 (2026-02-25)

  • 与原始 Scrapling GitHub README 同步

最后更新:2026-02-25

免责申明
部分文章来自各大搜索引擎,如有侵权,请与我联系删除。
打赏

文章底部电脑广告
手机广告位-内容正文底部

相关文章

您是本站第394076名访客 今日有1篇新文章/评论