Skip to content

Commit

Permalink
chore: fix outlook quotes issue and release v0.1.2 (#18)
Browse files Browse the repository at this point in the history
  • Loading branch information
BlankParticle authored Apr 8, 2024
1 parent a713fb0 commit a2d6807
Show file tree
Hide file tree
Showing 8 changed files with 314 additions and 7 deletions.
7 changes: 5 additions & 2 deletions packages/mailtools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,11 @@ We picked up on `tempo-email-parser` which was not being maintained any more and

## Limitations

It seems like we are unable to extract outlook signatures correctly. We need more source emails to add to the parsing tests and functions.
If you can help out with this, please open an issue with some html emails we can use
Its nearly impossible to parse every kind of outlook emails. We have implemented some measures to be able to parse them but we are not able to parse certain kind of signatures from them. Its totally impossible for us to parse them with out using some kind of LLM. Thats also might not be accurate.

We have covered major providers like gmail, newer outlook clients, proton mail and a few others.

You can help us improve this package by testing your email clients and signatures at <https://tools.unin.sh> and report in the built-in feedback system.

## License

Expand Down
2 changes: 1 addition & 1 deletion packages/mailtools/jsr.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "@u22n/mailtools",
"version": "0.1.1",
"version": "0.1.2",
"exports": "./src/index.ts"
}
2 changes: 1 addition & 1 deletion packages/mailtools/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@u22n/mailtools",
"version": "0.1.1",
"version": "0.1.2",
"type": "module",
"description": "Processes HTML email for display. Extracts quotations and more. Successor to tempo-email-parser.",
"main": "./dist/index.js",
Expand Down
20 changes: 18 additions & 2 deletions packages/mailtools/src/removeQuotations.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ function removeQuotations($: CheerioAPI): { didFindQuotation: boolean } {
* Returns a selection of all quote elements that should be removed
*/
function findAllQuotes($: CheerioAPI) {
const quoteElements = $(
let quoteElements = $(
[
'.gmail_quote',
'blockquote',
Expand All @@ -38,7 +38,11 @@ function findAllQuotes($: CheerioAPI) {
// ENHANCEMENT: Add findQuotesAfter__OriginalMessage__
].join(', ')
);
// console.log(quoteElements.html());

if (quoteElements.length === 0) {
quoteElements = findAllQuotesOutlook($);
}

// Ignore inline quotes. Quotes that are followed by non-quote blocks.
const quoteElementsSet = new Set(toArray(quoteElements));
const withoutInlineQuotes = quoteElements.filter(
Expand All @@ -48,6 +52,18 @@ function findAllQuotes($: CheerioAPI) {
return withoutInlineQuotes;
}

// its always outlook that has everything built different
function findAllQuotesOutlook($: CheerioAPI) {
const quoteStart = $("div[style*='border-top']").first();
const quotation = quoteStart.add(quoteStart.nextAll());
if (quotation.length === 0) {
return $();
}
const newHolder = $('<div></div>');
quotation.each((_, el) => void newHolder.append($(el)));
return newHolder;
}

/**
* Returns true if the element looks like an inline quote:
* it is followed by unquoted elements
Expand Down
7 changes: 6 additions & 1 deletion packages/mailtools/src/removeSignatures.ts
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,15 @@ function findAllSignatures($: CheerioAPI) {
}

function findAllSignaturesOutlook($: CheerioAPI) {
// this works in most cases, but fails in cases like outlook-client-5 in fixtures
// there is nothing we can even do in that case
// I had to leave that test case with a part of signature in it, so basically the test is invalid
// its kept for future references
const start = $(
':has(>[style*="mso-ligatures"], >[style*="mso-fareast"])'
).first();
const signatureTags = start.add(start.nextAll());
// Outlook native signatures end at usually in a div with a border-top style
const signatureTags = start.add(start.nextUntil("div[style*='border-top']"));
const newHolder = $('<div></div>');
signatureTags.each((_, el) => void newHolder.append($(el)));
return newHolder;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Aptos;}
@font-face
{font-family:"Segoe UI Emoji";
panose-1:2 11 5 2 4 2 4 2 2 3;}
@font-face
{font-family:Raleway;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="NL-BE" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway">Received but what about signature &amp; attachments?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway">Test attachment.txt included!<br>
<br>
What about a screenshot?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><img width="604" height="347" style="width:6.2916in;height:3.6145in" id="Afbeelding_x0020_2" src="cid:12345678"></span><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:&quot;Segoe UI Emoji&quot;,sans-serif">&#129315;</span><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"> screenshot end
</span><span lang="EN-GB" style="font-size:10.0pt;font-family:&quot;Segoe UI Emoji&quot;,sans-serif">&#128515;</span><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;color:#0E2432">Met vriendelijke groet<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" style="margin-left:-.4pt;border-collapse:collapse">
<tbody>
<tr>
<td width="85" valign="top" style="width:63.8pt;padding:0cm 5.4pt 0cm 5.4pt">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#C4014B;mso-fareast-language:NL"><img width="95" height="95" style="width:.9895in;height:.9895in" id="Afbeelding_x0020_12" src="cid:12345678"></span></b><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#C4014B;mso-fareast-language:NL"><o:p></o:p></span></b></p>
</td>
<td width="541" valign="top" style="width:406.0pt;padding:0cm 5.4pt 0cm 5.4pt">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#C4014B;mso-fareast-language:NL"><br>
</span></b><b><span style="font-size:10.0pt;font-family:Raleway;color:#C4014B;mso-fareast-language:NL">Your Name</span></b><b><span style="font-size:10.0pt;font-family:&quot;Something&quot;,sans-serif;color:#C4014B;mso-fareast-language:NL">
<br>
</span></b><b><span style="font-size:10.0pt;font-family:Raleway;color:#0E2432;mso-fareast-language:NL">Your Position<br>
Your Company<o:p></o:p></span></b></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:Raleway;color:#0E2432;mso-fareast-language:NL">Tel. 123 456 789</span></b><span style="font-size:10.0pt;font-family:Raleway;color:#0E2432;mso-fareast-language:NL"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:black;mso-fareast-language:NL"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;color:gray;mso-fareast-language:NL">Company -&nbsp;</span><span style="font-size:10.0pt;font-family:&quot;Calibri&quot;,sans-serif"><a href="http://www.example.com/"><span style="font-family:Raleway;color:gray;mso-fareast-language:NL">example.com</span></a></span><span style="font-size:10.0pt;font-family:Raleway;color:black;mso-fareast-language:NL"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;color:gray;mso-fareast-language:NL">Address<br>
</span><span style="font-size:8.0pt;font-family:Raleway;color:gray;mso-fareast-language:NL"><br>
</span><span style="font-size:7.0pt;font-family:Raleway;color:gray;mso-fareast-language:NL">Deze e-mail en eventuele bijlagen zijn vertrouwelijk en kunnen onder het wettelijk zwijgrecht vallen.<br>
Indien u niet de geadresseerde bent, is het ten strengste verboden deze e-mail publiek te maken, te reproduceren, te verdelen, of op een andere manier te verspreiden of te gebruiken.<br>
Indien u dit bericht per vergissing hebt ontvangen, gelieve dan de verzender onmiddellijk op de hoogte te stellen en deze e-mail te verwijderen.</span><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="NL" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">Van:</span></b><span lang="NL" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">
<a href="mailto:[email protected]">[email protected]</a> &lt;<a href="mailto:[email protected]">[email protected]</a>&gt;
<br>
<b>Verzonden:</b> zaterdag 6 april 2024 20:48<br>
<b>Aan:</b> Jelle Revyn &lt;<a href="mailto:[email protected]">[email protected]</a>&gt;<br>
<b>Onderwerp:</b> Test from unin.me<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p>Signature for sure isn't filtered... Do I still get in spam box?<o:p></o:p></p>
</div>
</body>
</html>

Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
<html
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40"
>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="Generator" content="Microsoft Word 15 (filtered medium)" />

<meta name="viewport" content="width=device-width" />
<style>
.customStyle {
background: red;
}
</style>
</head>
<body lang="NL-BE" link="#0563C1" vlink="#954F72" style="word-wrap: break-word">
<div class="WordSection1">
<p class="MsoNormal">
<span lang="EN-GB" style="font-size: 10pt; font-family: Raleway">Received but what about signature &amp; attachments?</span>
</p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<p class="MsoNormal">
<span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"
>Test attachment.txt included!<br />
<br />
What about a screenshot?</span
>
</p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<p class="MsoNormal">
<span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"
><img width="604" height="347" style="width: 6.2916in; height: 3.6145in" id="Afbeelding_x0020_2" src="cid:12345678" /></span
><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"></span>
</p>
<p class="MsoNormal">
<span lang="EN-GB" style="font-size: 10pt; font-family: &quot;Segoe UI Emoji&quot;, sans-serif">🤣</span
><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"> screenshot end </span
><span lang="EN-GB" style="font-size: 10pt; font-family: &quot;Segoe UI Emoji&quot;, sans-serif">😃</span
><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"></span>
</p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway; color: #0e2432">Met vriendelijke groet</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway; mso-fareast-language: EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" style="margin-left: -0.4pt; border-collapse: collapse">
<tbody>
<tr>
<td width="85" valign="top" style="width: 63.8pt; padding: 0cm 5.4pt 0cm 5.4pt">
<p class="MsoNormal" style="margin-bottom: 12pt">
<b
><span style="font-size: 10pt; font-family: &quot;Verdana&quot;, sans-serif; color: #c4014b; mso-fareast-language: NL"
><img width="95" height="95" style="width: 0.9895in; height: 0.9895in" id="Afbeelding_x0020_12" src="cid:12345678" /></span></b
><b><span style="font-size: 10pt; font-family: &quot;Verdana&quot;, sans-serif; color: #c4014b; mso-fareast-language: NL"></span></b>
</p>
</td>
<td width="541" valign="top" style="width: 406pt; padding: 0cm 5.4pt 0cm 5.4pt">
<p class="MsoNormal" style="margin-bottom: 12pt">
<b
><span style="font-size: 10pt; font-family: &quot;Verdana&quot;, sans-serif; color: #c4014b; mso-fareast-language: NL"
><br /> </span></b
><b><span style="font-size: 10pt; font-family: Raleway; color: #c4014b; mso-fareast-language: NL">Your Name</span></b
><b
><span style="font-size: 10pt; font-family: &quot;Something&quot;, sans-serif; color: #c4014b; mso-fareast-language: NL">
<br /> </span></b
><b
><span style="font-size: 10pt; font-family: Raleway; color: #0e2432; mso-fareast-language: NL"
>Your Position<br />
Your Company</span
></b
>
</p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal">
<b><span style="font-size: 10pt; font-family: Raleway; color: #0e2432; mso-fareast-language: NL">Tel. 123 456 789</span></b
><span style="font-size: 10pt; font-family: Raleway; color: #0e2432; mso-fareast-language: NL"></span>
</p>
<p class="MsoNormal">
<span style="font-size: 10pt; font-family: &quot;Calibri&quot;, sans-serif; color: black; mso-fareast-language: NL">&nbsp;</span>
</p>
<p class="MsoNormal">
<span style="font-size: 10pt; font-family: Raleway; color: gray; mso-fareast-language: NL">Company -&nbsp;</span
><span style="font-size: 10pt; font-family: &quot;Calibri&quot;, sans-serif"
><a href="http://www.example.com/" title="http://www.example.com/"
><span style="font-family: Raleway; color: gray; mso-fareast-language: NL">example.com</span></a
></span
><span style="font-size: 10pt; font-family: Raleway; color: black; mso-fareast-language: NL"></span>
</p>
<p class="MsoNormal">
<span style="font-size: 10pt; font-family: Raleway; color: gray; mso-fareast-language: NL">Address<br /> </span
><span style="font-size: 8pt; font-family: Raleway; color: gray; mso-fareast-language: NL"><br /> </span
><span style="font-size: 7pt; font-family: Raleway; color: gray; mso-fareast-language: NL"
>Deze e-mail en eventuele bijlagen zijn vertrouwelijk en kunnen onder het wettelijk zwijgrecht vallen.<br />
Indien u niet de geadresseerde bent, is het ten strengste verboden deze e-mail publiek te maken, te reproduceren, te verdelen, of op een
andere manier te verspreiden of te gebruiken.<br />
Indien u dit bericht per vergissing hebt ontvangen, gelieve dan de verzender onmiddellijk op de hoogte te stellen en deze e-mail te
verwijderen.</span
><span style="font-size: 11pt; font-family: &quot;Calibri&quot;, sans-serif"></span>
</p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway; mso-fareast-language: EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway; mso-fareast-language: EN-US">&nbsp;</span></p>
<div style="border: none; border-top: solid #e1e1e1 1pt; padding: 3pt 0cm 0cm 0cm">
<p class="MsoNormal">
<b><span lang="NL" style="font-size: 11pt; font-family: &quot;Calibri&quot;, sans-serif">Van:</span></b
><span lang="NL" style="font-size: 11pt; font-family: &quot;Calibri&quot;, sans-serif">
<a href="mailto:[email protected]" title="mailto:[email protected]">[email protected]</a> &lt;<a
href="mailto:[email protected]"
title="mailto:[email protected]"
>[email protected]</a
>&gt;
<br />
<b>Verzonden:</b> zaterdag 6 april 2024 20:48<br />
<b>Aan:</b> Jelle Revyn &lt;<a href="mailto:[email protected]" title="mailto:[email protected]">[email protected]</a>&gt;<br />
<b>Onderwerp:</b> Test from <a href="http://unin.me" target="_blank" rel="noopener noreferrer" title="http://unin.me">unin.me</a></span
>
</p>
</div>
<p class="MsoNormal">&nbsp;</p>
<p>Signature for sure isn't filtered... Do I still get in spam box?</p>
</div>
</body>
</html>
Loading

0 comments on commit a2d6807

Please sign in to comment.