date
category
tag
summary
前言解决策略呈现效果脚本运行演示实现过程与Claude对话六轮,实现初步脚本对话一:基础,如何用Microlink API进行截图对话二:闭环,如何通过Notion API将截图传到Notion对话三:优化,如何将Notion数据库页面的URL属性所示网站截图至对应页面的File属性对话四:优化,如何分次提取50个,并跳过File中已存在截图的页面对话五:排障,如何跳过需要登陆的页面对话六:升级,使用Chrome开发者工具进行自动化截图与 ChatGPT 和 Mistral 对话,完成最终效果最终代码常见问题错误一,少数网站有使用条款和隐私政策信息的警告弹窗,改手动错误二,Notion API连接次数限制,等一会再运行结语
前言
为了导航站能更直观的显示,我需要将大量的链接网站首页批量截图保存至Notion。
解决策略
通过LLM实现从自然语言到自动化脚本的转化,借助Chrome开发者工具作为自动截图工具,Cloudinary 作为外部图像服务,Notion API 作为接口。
呈现效果
脚本运行演示
实现过程
与Claude对话六轮,实现初步脚本
对话一:基础,如何用Microlink API进行截图
K
want to use API to take screenshots of multiple URLs. But I can't understand the content below. Can you explain this and help me learn to use microlink API to take screenshots?
{
screenshot
Type: <boolean> Default: false
It generates a screenshot over the target url.
* CLI
* JavaScript
* Shell
* Python
* Ruby
When it's enabled, a new
screenshot
data field will be part of the response payload.You can configure different specific options, such as element or overlay:
* CLI
* JavaScript
* Shell
* Python
* Ruby
Also, combine it with embed for inserting it as HTML markup and refresh it asynchronously in the background (known as stale).
or inside CSS stylesheets:
even in Markdown:
}
注:{*}内容来自 Turns websites into data — Microlink
Claude
I'd be happy to explain how to use the Microlink API to take screenshots of multiple URLs. Let's break this down step by step:
- Understanding the Microlink API:
Microlink is a service that allows you to extract information from any website, including taking screenshots. The API endpoint for this service is
https://api.microlink.io
.
- Basic Usage: To take a screenshot of a URL, you need to make a GET request to the Microlink API with the following parameters:
url
: The website URL you want to screenshotscreenshot
: Set totrue
to enable screenshot generation
Example:
- API Response: When successful, the API will return a JSON response containing the screenshot information, including the URL of the generated screenshot, its dimensions, file type, and size.
- Taking Screenshots of Multiple URLs: To screenshot multiple URLs, you'll need to make separate API calls for each URL. This can be done using a script in your preferred programming language.
Here's a simple Python script to demonstrate how to take screenshots of multiple URLs:
This script does the following:
- Defines a function
take_screenshot
that makes a request to the Microlink API for a given URL.
- If successful, it extracts the screenshot URL from the API response.
- We then loop through a list of URLs, calling
take_screenshot
for each one.
- The script prints the screenshot URL for each successful request, or an error message for failed requests.
To use this script, you'll need to install the
requests
library (pip install requests
) if you haven't already.Additional Notes:
- The API offers more customization options, such as specifying the screenshot dimensions, format, and more. These can be added as additional parameters to the API call.
- For high-volume usage, you may need to sign up for a Microlink account and use authentication.
- Be mindful of rate limits and terms of service when using the API.
Would you like me to explain any part of this in more detail or help you implement this in a different programming language?
对话二:闭环,如何通过Notion API将截图传到Notion
K
After that ,how to connect with Notion API and put this screenshot into a specific page which is in a Notion database?
Claude
Certainly! I can help you connect the Microlink screenshot functionality with the Notion API to add the screenshots to a specific page in a Notion database. Let's break this down into steps:
- Set up Notion API
- Create a function to add content to a Notion page
- Combine the screenshot and Notion functionality
Here's how we can do this:
- Set up Notion API: First, you need to set up your Notion integration and get your API key. Here are the steps:
- Click "New integration" and give it a name
- Select the workspace where you'll use it
- Copy the "Internal Integration Token" (this is your API key)
- In your Notion workspace, share the database with your integration
- Create a function to add content to a Notion page: We'll create a function that adds a new block (in this case, an image) to a specific Notion page.
- Combine the screenshot and Notion functionality: We'll modify our previous script to include the Notion integration.
Here's the updated Python script that accomplishes this:
To use this script, you need to:
- Install the required Python library:
- Replace
"your_notion_api_key_here"
with your actual Notion API key.
- Replace
"your_notion_page_id_here"
with the ID of the Notion page where you want to add the screenshots. You can find this ID in the URL of your Notion page.
- Customize the
urls
list with the websites you want to screenshot.
This script does the following:
- It uses the
take_screenshot
function from our previous example to get screenshot URLs from Microlink.
- It defines a new function
add_image_to_notion_page
that uses the Notion API to add an image block to a specific page.
- In the
main
function, it loops through each URL, takes a screenshot, and then adds that screenshot to the specified Notion page.
A few important notes:
- This script adds each screenshot as a new block at the end of the specified Notion page. If you want to add them to a database instead, you'll need to modify the Notion API call to create new pages in the database.
- The Notion API has rate limits, so if you're processing many URLs, you might need to add delays between requests.
- Error handling in this script is basic. In a production environment, you'd want to add more robust error handling and possibly retries for failed requests.
- Make sure to keep your Notion API key secure and never share it publicly.
Would you like me to explain any part of this in more detail or help you customize it further?
对话三:优化,如何将Notion数据库页面的URL属性所示网站截图至对应页面的File属性
K
There are a lot of page in a notion database. They all have url property. If I can take screenshots of urls which is in url property and put them into the same page's file property?
Claude
Certainly! I understand you want to:
- Access a Notion database
- For each page in the database, get the URL from a "url" property
- Take a screenshot of that URL
- Add the screenshot to the same page's "file" property
This is a great use case. Let's modify our script to accomplish this task. Here's how we can do it:
To use this script:
- Install the required Python library if you haven't already:
- Replace
"your_notion_api_key_here"
with your actual Notion API key.
- Replace
"your_database_id_here"
with the ID of your Notion database. You can find this ID in the URL of your database page.
Here's what this script does:
get_database_pages
: This function retrieves all pages from the specified Notion database.
update_page_with_screenshot
: This function updates a specific page in the Notion database, adding the screenshot URL to the "file" property.
- In the
main
function: - It gets all pages from the database.
- For each page, it checks for a "url" property.
- If a URL is found, it takes a screenshot using the Microlink API.
- It then updates the page, adding the screenshot URL to the "file" property.
- There's a 1-second delay between processing each page to respect rate limits.
Important notes:
- Error handling: The script includes basic error handling. In a production environment, you might want to add more robust error handling and logging.
- Rate limits: Both Notion and Microlink have rate limits. The script includes a basic 1-second delay between operations, but you might need to adjust this based on your specific usage and the rate limits of your account.
- Pagination: If your database has more than 100 pages, you'll need to implement pagination in the
get_database_pages
function to retrieve all pages.
- Property names: The script assumes your URL property is named "url" and your file property is named "file". If these are named differently in your database, you'll need to adjust the property names in the script.
- Existing files: This script will add a new file to the "file" property. If you want to replace an existing file or handle cases where there might already be files, you'll need to modify the
update_page_with_screenshot
function.
Would you like me to explain any part of this further or help you with any modifications?
对话四:优化,如何分次提取50个,并跳过File中已存在截图的页面
K
My database has more than 100 pages. And I want it run if the file property is empty. Can I take screenshots 50 times once and it can jump over which have already had screenshot in file property?So that I can take screenshots of all my 200+ pages by running a couple of times. Help me improve it.
Claude
Certainly! I understand your requirements:
- Handle pagination for databases with more than 100 pages
- Only take screenshots if the file property is empty
- Limit the script to process 50 pages at a time
- Skip pages that already have screenshots
Let's modify the script to accommodate these requirements. Here's an improved version:
Here's what's new in this improved version:
- Pagination: The
get_database_pages
function now supports pagination. It returns thenext_cursor
andhas_more
flag along with the results.
- Limit processing: The script now has a
max_screenshots
variable (set to 50) to limit the number of screenshots taken in one run.
- Empty file property check: The script now checks if the file property is empty before processing a page.
- Cursor-based iteration: The main loop now uses a cursor to iterate through all pages in the database, continuing where it left off in the previous run.
To use this script:
- Replace
"your_notion_api_key_here"
with your actual Notion API key.
- Replace
"your_database_id_here"
with your database ID.
- Run the script multiple times to process all pages in your database.
Key points:
- The script will process up to 50 pages that need screenshots in each run.
- It will skip pages that already have files in the file property.
- You can run this script multiple times to eventually process all pages in your database.
- The script respects rate limits by waiting 1 second between operations.
To process all 200+ pages in your database:
- Run the script once. It will process up to 50 pages.
- Wait a bit (maybe a few minutes) to respect rate limits.
- Run the script again. It will start where it left off and process up to 50 more pages.
- Repeat steps 2-3 until all pages have been processed.
This approach allows you to process your entire database in manageable chunks, respecting rate limits and avoiding unnecessary API calls for pages that already have screenshots.
Would you like me to explain any part of this further or help you with any additional modifications?
对话五:排障,如何跳过需要登陆的页面
K
{
Screenshot taken: https://iad.microlink.io/e3bFoUcPA2-yv0mffLhp9N3gAeVNI0XeYUcXMOQLqh2TI9oQ6vuSVsoRvv1N6uW5r2s4y4qjto0XMLm5wF-_Kg.png
Screenshot added to Notion page
Processing URL: https://cloud.mongodb.com/v2#/org/663ee0a28e5df13806dad550/projects
Traceback (most recent call last):
File "C:\Users\Kinpoe·Ray\Downloads\screenshot-to-notion.py", line 115, in <module>
main()
File "C:\Users\Kinpoe·Ray\Downloads\screenshot-to-notion.py", line 86, in main
screenshot_url = take_screenshot(url)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Kinpoe·Ray\Downloads\screenshot-to-notion.py", line 12, in take_screenshot
return data['data']['screenshot']['url']
~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'screenshot'
}
I meet someting erroe like this . If I can jump over this kind of url in script? Or if you have ways to take screenshots of this kind of url?
Claude
I understand you're encountering an error when trying to take a screenshot of certain URLs, specifically for
https://cloud.mongodb.com/v2#/org/663ee0a28e5df13806dad550/projects
in this case. This error occurs because the Microlink API couldn't generate a screenshot for this URL, likely due to authentication requirements or other access restrictions.Let's modify the script to handle these cases gracefully. We'll add error handling to skip URLs that can't be screenshotted and optionally log them for manual review. Here's an improved version of the
take_screenshot
function and some modifications to the main loop:Key changes in this updated script:
- Error Handling: The
take_screenshot
function now has better error handling. It will catch anyRequestException
(which includes connection errors, timeouts, etc.) and log the error.
- Logging: We've added logging functionality. Errors will be written to a file named
screenshot_errors.log
in the same directory as the script.
- Skipping Problematic URLs: If a screenshot can't be taken (either due to an error or because the API response doesn't contain a screenshot URL), the script will log the issue and continue to the next URL.
- Graceful Continuation: Even if a screenshot fails, the script will continue processing other URLs.
To use this updated script:
- Replace
"your_notion_api_key_here"
with your actual Notion API key.
- Replace
"your_database_id_here"
with your database ID.
- Run the script as before.
This improved version will:
- Skip over URLs that can't be screenshotted (like the MongoDB URL you encountered)
- Log errors to
screenshot_errors.log
for your review
- Continue processing other URLs even if some fail
After running the script, you can check the
screenshot_errors.log
file to see which URLs couldn't be screenshotted. For these URLs, you might need to:- Manually take screenshots if they require authentication
- Review if they're internal URLs that the Microlink API can't access
- Check if they're valid URLs or if they need updating in your Notion database
Remember, some URLs (especially those requiring authentication or accessing internal systems) may not be accessible to the Microlink API. For these, you might need to explore alternative screenshot methods or manually add screenshots to your Notion pages.
Is there anything else you'd like me to explain or modify in the script?
对话六:升级,使用Chrome开发者工具进行自动化截图
K
{
How to screenshot a webpage right in Chrome
Taking a screenshot of an entire webpage can be incredibly useful, whether you're capturing a bug to report, saving a snapshot for reference, or gathering information for research. While there are various tools and extensions available for this purpose, Google Chrome's Developer Tools offer a built-in, straightforward method to capture full-page screenshots without the need for additional software. This guide will walk you through the steps to utilize this feature effectively.
We give an overview of more manual ways to screenshot webpages in our post here.
Access Chrome developer tools
To begin capturing a screenshot using Chrome's Developer Tools, you first need to access the tools themselves. This can be done through keyboard shortcuts or the Chrome menu.
1. Using Keyboard Shortcuts:
2. On a Mac, press
Cmd
+ Opt
+ I
.
3. On Windows, press Ctrl
+ Shift
+ I
.
4. Using Chrome Menu:
5. Click the three-dot menu icon at the top-right corner of Chrome.
6. Navigate to More Tools > Developer Tools.
Taking a full-page screenshot
Once you have the Developer Tools open, you're just a few steps away from capturing a full-page screenshot.
1. Open the Command Menu:
2. On a Mac, press Cmd
+ Shift
+ P
.
3. On Windows, press Ctrl
+ Shift
+ P
.
4. In the Command Menu, type "screenshot" to bring up the screenshot options.
5. Select Capture area screenshot/Capture full size screenshot/Capture node screenshot/Capture screenshot from the list of options. Chrome will then capture the webpage.}
How can I use this way to automatically take screenshots of multiple web pages to Notion ?
注:{*}内容来自 Bardeen
Claude
To automate taking screenshots of multiple webpages using Chrome's Developer Tools method and then upload them to Notion, we'll need to use a combination of browser automation (with Selenium or Playwright) and the Notion API. This approach will allow us to simulate the process of opening Developer Tools, running commands, and capturing screenshots programmatically. Here's how we can modify our script to accomplish this:
This script uses Selenium to automate Chrome and take screenshots using the Developer Tools method you described. Here's a breakdown of the key changes and how to use this script:
- Setup:
- Install required libraries:
pip install selenium requests
- Download ChromeDriver that matches your Chrome version and update the path in
setup_chrome_driver()
- Key Functions:
setup_chrome_driver()
: Initializes a headless Chrome browsertake_screenshot()
: Uses Selenium to open Developer Tools and capture a full-page screenshotupload_to_notion()
: Uploads the screenshot to Notion and updates the page
- Usage:
- Replace
"your_notion_api_key_here"
with your Notion API key - Replace
"your_database_id_here"
with your Notion database ID - Run the script
- Features:
- Takes full-page screenshots using Chrome's Developer Tools
- Uploads screenshots directly to Notion
- Handles pagination for large databases
- Skips pages that already have files
- Logs errors for review
- Limitations and Considerations:
- This method may be slower than using a dedicated screenshot API
- It requires Chrome and ChromeDriver to be installed
- Some websites may detect and block automated browsers
- Error handling for complex web applications might need refinement
To use this script:
- Ensure you have Chrome installed
- Download the appropriate ChromeDriver for your Chrome version
- Install the required Python libraries
- Update the Notion API key and database ID in the script
- Run the script
This approach should work for most websites, including those that were problematic with the Microlink API. However, it may still face challenges with websites that require authentication or have complex dynamic content.
Would you like me to explain any part of this script in more detail or help you with any modifications?
以上是第一阶段的对话,我从使用Microlink API到直接用Chrome开发者工具。
因为Microlink API免费一天只能50次,而使用Chrome开发者工具不限次。
与 ChatGPT 和 Mistral 对话,完成最终效果
Claude的免费次数用完后,我把代码和运行问题托给了ChatGPT。
优化处理:把图像上传到外部服务上,这里我用的是Cloudinary。
升级后的代码长度太长,免费版 Claude 和 ChatGPT 都不够用了。
想起之前看到 Mistral Large 123B很接近Meta Llama 405B的能力,甚至编程更强 ,于是我切换到与Le Chat - Mistral AI 进行对话 。
最终代码
Chrome-Screenshot-to-Notion
kinpoe-ray • Updated Sep 20, 2024
常见问题
错误一,少数网站有使用条款和隐私政策信息的警告弹窗,改手动
最终脚本还是会碰到极少数的以下问题,这两个网站一个是LLM竞技场,一个是识别是AI还是人。
根据错误信息,脚本在尝试对某些网站(如https://chat.lmsys.org/和https://www.aiornot.com/)进行截图时遇到了意外的警告弹窗。这些警告弹窗包含了使用条款和隐私政策信息,要求用户同意才能继续访问网站。
具体来说,这些网站在访问时弹出了一个警告框,内容大致如下:
- 该服务是一个研究预览版
- 它只提供有限的安全措施,可能会生成令人反感的内容
- 不得将其用于任何非法、有害、暴力、种族主义或性相关目的
- 请不要上传任何私人信息
- 该服务收集用户对话数据(包括文本和图像),并保留权利以知识共享署名(CC-BY)或类似许可分发这些数据
自动化脚本无法处理这种意外的警告弹窗,因此无法继续执行截图操作,导致了错误。
这类网站需手动截图上传。
错误二,Notion API连接次数限制,等一会再运行
在运行第三次时,显示以下的连接问题,估计是Notion API有连接次数限制,等待一会再尝试即可。
等待一会之后又能运行了。
结语
写在最后,为了更方便的共享代码,我花了半天时间学习如何使用 Git, GitHub, VS Code, 然后上传了第一个自己整理并输出的代码,还找了个参考模板撰写README文档,在下一篇博客我会分享这整个过程,包括学习资源。
我作为设计师(正在All in AI),这次通过与Claude、ChatGPT和Mistral等AI助手的交互,就完成了一个的网页截图自动化脚本…
根据模型能力边界选择合适的LLM,结合清晰的问题定义能力,引导AI并对其输出进行验证和优化,这或许会未来最常见的基本能力。