Dream Jar (◠‿・)—☆

Storing Gmail Newsletters in Google Sheets for AI Training

I wanted to start the AI Literary Cartographer Project by building a daily automation system that extracts content from BookBar newsletters (in Gmail), transforms the messages, and stores the results in Google Sheets. This feature is important as I'll need to capture rich context from all the newsletters I read on a regular basis in order to provide GPT with meaningful background for my personalised book recommendations.


Workflow Overview

The system is built in Pipedream and runs on a daily schedule. Here’s the complete pipeline:

Screenshot 2025-08-06 8


1. Trigger (Daily at 3:10 PM)

User story
As a system, I must automatically initiate the email scraping process daily, so that I can keep the database up to date with the latest BookBar newsletter content.

Steps

No code needed—just configure the time in the UI.

Screenshot 2025-08-06 8


2. Search Emails from BookBar

User story
As a system, I must search for emails from BookBar in Gmail, so that I can identify which messages need to be processed.

Steps

Screenshot 2025-08-06 8


3. Extract Email Content

User story
As a system, I must extract full MIME content from each email, so that I can access the subject, plain text, HTML, and timestamp.

Steps

Code

import { axios } from "@pipedream/platform"
import gmail from "@pipedream/gmail"

export default defineComponent({
  name: "Process Gmail Messages Array",
  description: "Process array of Gmail messages from search results and extract raw MIME data and content from each email including nested MIME parts, attachments, HTML content, plain text, headers, and metadata with proper error handling",
  type: "action",
  props: {
    gmail,
    q: {
      propDefinition: [
        gmail,
        "q",
      ],
      description: "Gmail search query to find messages. Use Gmail's standard search operators (e.g., 'from:example@gmail.com', 'has:attachment', 'subject:important')",
    },
    maxResults: {
      type: "integer",
      label: "Max Results",
      description: "Maximum number of messages to process (default: 10, max: 500)",
      optional: true,
      default: 10,
      min: 1,
      max: 500,
    },
    includeAttachmentData: {
      type: "boolean",
      label: "Include Attachment Data",
      description: "Whether to download and include the actual attachment data (base64 encoded). Warning: This may significantly increase processing time and memory usage.",
      optional: true,
      default: false,
    },
  },
  methods: {
    async processMimeParts(parts, messageId) {
      if (!parts || !Array.isArray(parts)) return [];

      const processedParts = [];

      for (const part of parts) {
        const processedPart = {
          partId: part.partId,
          mimeType: part.mimeType,
          filename: part.filename || null,
          headers: part.headers || [],
          body: {
            size: part.body?.size || 0,
            attachmentId: part.body?.attachmentId || null,
          },
        };

        if (part.body?.data) {
          try {
            processedPart.body.data = Buffer.from(part.body.data, 'base64').toString('utf-8');
          } catch (error) {
            processedPart.body.dataError = `Failed to decode body data: ${error.message}`;
          }
        }

        if (part.body?.attachmentId && this.includeAttachmentData) {
          try {
            const attachment = await this.gmail.getAttachment({
              messageId,
              attachmentId: part.body.attachmentId,
            });
            processedPart.body.attachmentData = attachment.data;
            processedPart.body.attachmentSize = attachment.size;
          } catch (error) {
            processedPart.body.attachmentError = `Failed to fetch attachment: ${error.message}`;
          }
        }

        if (part.parts && part.parts.length > 0) {
          processedPart.parts = await this.processMimeParts(part.parts, messageId);
        }

        processedParts.push(processedPart);
      }

      return processedParts;
    },

    extractContentByType(parts, mimeType) {
      if (!parts) return [];

      const content = [];

      for (const part of parts) {
        if (part.mimeType === mimeType && part.body?.data) {
          try {
            content.push({
              partId: part.partId,
              content: Buffer.from(part.body.data, 'base64').toString('utf-8'),
              size: part.body.size,
            });
          } catch (error) {
            content.push({
              partId: part.partId,
              error: `Failed to decode ${mimeType} content: ${error.message}`,
            });
          }
        }

        if (part.parts) {
          content.push(...this.extractContentByType(part.parts, mimeType));
        }
      }

      return content;
    },

    extractAttachments(parts) {
      if (!parts) return [];

      const attachments = [];

      for (const part of parts) {
        if (part.filename && part.body?.attachmentId) {
          attachments.push({
            partId: part.partId,
            filename: part.filename,
            mimeType: part.mimeType,
            attachmentId: part.body.attachmentId,
            size: part.body.size,
          });
        }

        if (part.parts) {
          attachments.push(...this.extractAttachments(part.parts));
        }
      }

      return attachments;
    },

    extractHeaders(message) {
      const headers = {};
      const headersList = message.payload?.headers || [];

      for (const header of headersList) {
        const key = header.name.toLowerCase();
        headers[key] = header.value;
      }

      return headers;
    },
  },

  async run({ $ }) {
    try {
      $.export("$summary", "Searching for Gmail messages...");

      const searchResults = await this.gmail.listMessages({
        q: this.q,
        maxResults: this.maxResults,
      });

      if (!searchResults.messages || searchResults.messages.length === 0) {
        $.export("$summary", "No messages found matching the search criteria");
        return {
          totalMessages: 0,
          messages: [],
        };
      }

      $.export("$summary", `Processing ${searchResults.messages.length} messages...`);

      const processedMessages = [];
      let successCount = 0;
      let errorCount = 0;

      for (let i = 0; i < searchResults.messages.length; i++) {
        const messageRef = searchResults.messages[i];

        try {
          const [fullMessage, rawMessage] = await Promise.all([
            this.gmail.getMessage({ id: messageRef.id }),
            this.gmail.getMessage({ id: messageRef.id, format: 'raw' }).catch(() => null),
          ]);

          const headers = this.extractHeaders(fullMessage);
          const mimeParts = await this.processMimeParts(fullMessage.payload.parts, messageRef.id);

          const textContentRaw = this.extractContentByType(fullMessage.payload.parts, 'text/plain');
          const htmlContentRaw = this.extractContentByType(fullMessage.payload.parts, 'text/html');

          const textContent = textContentRaw.map(p => p.content).join('\n') || '';
          const htmlContent = htmlContentRaw.map(p => p.content).join('\n') || '';

          const attachments = this.extractAttachments(fullMessage.payload.parts);

          const processedMessage = {
            id: fullMessage.id,
            threadId: fullMessage.threadId,
            labelIds: fullMessage.labelIds || [],
            snippet: fullMessage.snippet,
            historyId: fullMessage.historyId,
            internalDate: fullMessage.internalDate,
            sizeEstimate: fullMessage.sizeEstimate,
            headers,
            subject: headers.subject || '',
            from: headers.from || '',
            to: headers.to || '',
            date: headers.date || '',
            rawMimeData: rawMessage?.raw || null,
            payload: {
              partId: fullMessage.payload.partId,
              mimeType: fullMessage.payload.mimeType,
              filename: fullMessage.payload.filename,
              headers: fullMessage.payload.headers,
              parts: mimeParts,
            },
            content: {
              html: htmlContent,
              text: textContent,
            },
            attachments,
            processedAt: new Date().toISOString(),
            processingIndex: i + 1,
          };

          processedMessages.push(processedMessage);
          successCount++;

        } catch (error) {
          errorCount++;
          processedMessages.push({
            id: messageRef.id,
            error: error.message,
            errorType: error.constructor.name,
            processedAt: new Date().toISOString(),
            processingIndex: i + 1,
          });
        }
      }

      const summary = `Successfully processed ${successCount} messages, ${errorCount} errors`;
      $.export("$summary", summary);

      return {
        searchQuery: this.q,
        totalMessages: searchResults.messages.length,
        successCount,
        errorCount,
        includeAttachmentData: this.includeAttachmentData,
        processedAt: new Date().toISOString(),
        messages: processedMessages,
      };

    } catch (error) {
      throw new Error(`Failed to process Gmail messages: ${error.message}`);
    }
  },
});

Screenshot 2025-08-06 8


4. Flatten Email Data (Node.js)

User story
As a system, I must flatten the nested structure of Gmail MIME messages, so that I can work with a clean array of objects.

Steps

Code

export default defineComponent({
  props: {
    emailMessages: {
      type: "any",
      label: "Email Messages",
    },
  },
  async run({ $ }) {
    return this.emailMessages.map(msg => ({
      timestamp: msg.internalDate,
      subject: msg.payload.headers.find(h => h.name === "Subject")?.value || "(No Subject)",
      text: Buffer.from(msg.payload.parts?.[0]?.body?.data || "", "base64").toString("utf-8"),
      html: Buffer.from(msg.payload.parts?.[1]?.body?.data || "", "base64").toString("utf-8")
    }));
  }
});

Screenshot 2025-08-06 8


5. Transform for Sheets (Node.js)

User story
As a system, I must transform the email objects into row arrays, so that I can write them to Google Sheets.

Steps

Code

export default defineComponent({
  props: {
    flattenedEmails: {
      type: "any",
      label: "Flattened Email Data"
    }
  },
  async run({ $ }) {
    return this.flattenedEmails.map(email => [
      email.timestamp,
      email.subject,
      email.text,
      email.html
    ]);
  }
});

Screenshot 2025-08-06 8


6. Add Rows to Google Sheet

User story
As a system, I must store each email as a new row in Google Sheets, so that the data can be used later by the AI Literary Cartographer.

Steps

Screenshot 2025-08-06 8

Screenshot 2025-08-06 8


💬 Reflections

This was a surprisingly complex workflow 🥵.

what I learned (the hard way):


What’s next