Building a MindStudio Worker for Website Monitoring and Notifications

Building a MindStudio Worker for Website Monitoring and Notifications

3:10
Use Case

In this blog post, we will walk through creating a simple yet powerful MindStudio worker that monitors a website for updates and alerts you via email when new content is detected. For this example, we’ll pretend you’re a journalist tracking layoff or closure notices issued by the state of New York.

What We’re Building

Our goal: Monitor a website for new layoff or closure notices, extract data from PDFs, and summarize the key information in an email.

This worker will automatically run every hour and provide:

  1. Headlines and teasers
  2. Relevant metadata
  3. A three-bullet summary
  4. A link to the source PDF

Step 1: Generate a New Workflow

  1. Open MindStudio and create a new workflow.
  2. In the workflow request, specify:
    • Schedule: Run every hour.
    • Task:
      • Check a specific URL for new notices.
      • Scrape the PDF linked to each new notice.
      • Generate a high-level report summarizing the content.
      • Send this report as an email.
  3. Click Generate to let MindStudio build the workflow.

At this stage, MindStudio will propose the architecture for your workflow.

Step 2: Understand the Workflow Architecture

After generating, review the proposed workflow components:

  • Global Variable: Keeps track of previously processed notices.
  • Schedule: Runs hourly at the start of each hour.
  • Scraper: Scrapes the notice page and identifies linked PDFs.
  • Logic Block: Determines if there are new notices.
  • PDF Processor: Extracts text content from PDFs.
  • Email Sender: Formats the extracted content into Markdown and sends it via email.

If everything looks good, proceed to build the workflow.

Step 3: Explore the Generated Workflow

  1. System Prompt:
  2. MindStudio generates a system prompt that aligns with your request.
  3. Automation Schedule:
  4. The automation runs hourly and follows this sequence:
    • Load Previously Processed Notices using a custom function.
    • Scrape the Website for any new notices.
    • Check for New Content using a logic block.
  5. Processing the PDFs:
  6. If new notices are found, the system:
    • Extracts text content from the linked PDF.
    • Summarizes and formats the information into a readable email.
    • Sends the result via email.

Step 4: Run the Workflow in Debugger

To test your setup:

  1. Run the workflow in Debug Mode to simulate the process.
  2. Observe the following steps:
    • Scraping the website for notices.
    • Identifying new PDFs.
    • Extracting and processing text content.
    • Generating the summarized results.
  3. Check the debug logs to confirm that the workflow executed each step correctly.

Step 5: Verify the Results

Once the workflow runs successfully:

  1. Open your email inbox.
  2. Verify that you received a report email containing:
    • Multiple headlines and a teaser
    • Metadata and a three-bullet summary
    • A link to the source PDF

Example of the final output in email format:

markdown
Copy code
**Headline 1:** Company XYZ Announces Layoff
**Teaser:** 500 employees will be affected starting next month.

**Summary:**
- Reason: Financial restructuring
- Impacted Locations: New York, NY
- Effective Date: July 1, 2024

[Read Full Notice (PDF)](URL)

Step 6: Deploy and Automate

With everything verified, your MindStudio worker is ready for deployment.

  • The automation runs hourly without manual intervention.
  • You’ll receive alerts immediately when new notices are published.