In this blog post, we will walk through creating a simple yet powerful MindStudio worker that monitors a website for updates and alerts you via email when new content is detected. For this example, we’ll pretend you’re a journalist tracking layoff or closure notices issued by the state of New York.
What We’re Building
Our goal: Monitor a website for new layoff or closure notices, extract data from PDFs, and summarize the key information in an email.
This worker will automatically run every hour and provide:
- Headlines and teasers
- Relevant metadata
- A three-bullet summary
- A link to the source PDF
Step 1: Generate a New Workflow
- Open MindStudio and create a new workflow.
- In the workflow request, specify:
- Schedule: Run every hour.
- Task:
- Check a specific URL for new notices.
- Scrape the PDF linked to each new notice.
- Generate a high-level report summarizing the content.
- Send this report as an email.
- Click Generate to let MindStudio build the workflow.
At this stage, MindStudio will propose the architecture for your workflow.
Step 2: Understand the Workflow Architecture
After generating, review the proposed workflow components:
- Global Variable: Keeps track of previously processed notices.
- Schedule: Runs hourly at the start of each hour.
- Scraper: Scrapes the notice page and identifies linked PDFs.
- Logic Block: Determines if there are new notices.
- PDF Processor: Extracts text content from PDFs.
- Email Sender: Formats the extracted content into Markdown and sends it via email.
If everything looks good, proceed to build the workflow.
Step 3: Explore the Generated Workflow
- System Prompt:
- MindStudio generates a system prompt that aligns with your request.
- Automation Schedule:
- The automation runs hourly and follows this sequence:
- Load Previously Processed Notices using a custom function.
- Scrape the Website for any new notices.
- Check for New Content using a logic block.
- Processing the PDFs:
- If new notices are found, the system:
- Extracts text content from the linked PDF.
- Summarizes and formats the information into a readable email.
- Sends the result via email.
Step 4: Run the Workflow in Debugger
To test your setup:
- Run the workflow in Debug Mode to simulate the process.
- Observe the following steps:
- Scraping the website for notices.
- Identifying new PDFs.
- Extracting and processing text content.
- Generating the summarized results.
- Check the debug logs to confirm that the workflow executed each step correctly.
Step 5: Verify the Results
Once the workflow runs successfully:
- Open your email inbox.
- Verify that you received a report email containing:
- Multiple headlines and a teaser
- Metadata and a three-bullet summary
- A link to the source PDF
Example of the final output in email format:
markdown
Copy code
**Headline 1:** Company XYZ Announces Layoff
**Teaser:** 500 employees will be affected starting next month.
**Summary:**
- Reason: Financial restructuring
- Impacted Locations: New York, NY
- Effective Date: July 1, 2024
[Read Full Notice (PDF)](URL)
Step 6: Deploy and Automate
With everything verified, your MindStudio worker is ready for deployment.
- The automation runs hourly without manual intervention.
- You’ll receive alerts immediately when new notices are published.