Python: parsing emails with embedded images -
i working on application connects mail server using python pop3 library parses emails , put them database.
i have parse text emails, html emails , attachments. now, stuck emails contain embedded images emails. server howing cid: code images in src tag , image in bytes. not sure how images , map them cids.
please suggest.
thanks in advance.
below email content getting:
content-type: multipart/alternative; boundary="php-alt-e0af773d09fadf5208f69aecffcb4de888824263" --php-alt-e0af773d09fadf5208f69aecffcb4de888824263 content-type: text/plain hi, testing embedded images email! --php-alt-e0af773d09fadf5208f69aecffcb4de888824263 content-type: multipart/related; boundary="php-related-e0af773d09fadf5208f69aecffcb4de888824263" --php-alt-e0af773d09fadf5208f69aecffcb4de888824263 content-type: text/html <html> <head> <title>test html mail</title> </head> <body> <font color='red'>hai, me!</font> here picture: <img src="cid:php-cid-e0af773d09fadf5208f69aecffcb4de888824263" /> </body> </html> --php-related-e0af773d09fadf5208f69aecffcb4de888824263 content-type: image/gif content-transfer-encoding: base64 content-id: <php-cid-e0af773d09fadf5208f69aecffcb4de888824263> ivborw0kggoaaaansuheugaaaeyaaaagcamaaacyxf7xaaaagxrfwhrtb2z0d2fyzqbbzg9izsbj bwfnzvjlywr5ccllpaaaawbqtfrf////onkwy6zztnc08/304+p/6/psrhgpzypwghctwqfwe7pz wznfwna+q2uqgpz5jgcz4ezj7e3/6oj/tbw62tr/aadik1ssuhq6okesi0um5phkaaaaazhifhx6 ymjkwhdjy5lbi6yfw5ru0+lsnq2vmz6mm8is8vl/dxvzrerfjvujrnalcrntkzgrlnyslswj3e3d 7fxwstirwyjb3ergyeti9vb/iiiigokbd6v0np6ce51ru2pdqmqlvvvwtnpfhcn7ntu2ryuqpbwd rkysohcn5vbql6eomwybmkui+fn/uostk6ylzgrm7f7tllgkoxg20dvniiiiguuer4q0inmcaytf 3+/e3d3czd7kjy2nnb6wtdozkwkmhoagujnnjl+fhlt7jlp9if0z/v7/0tlrqrijvx9utmza+v38 qko5sw5evya9jkwpmzwocnjub7rnfzpy3vpcaghkhywdbm5rhisirozgn0gxm6aq/pz/oyayxm1v pkspehh2q1m5oqkgiaz+dz1vbqratvu4k7gfe6xqpr6c1+rb3utcfcdx0d3qk7ephaj6cqvstp5h xnza1eztvots7e7uv968+v76xtpbplczm7ovydfddk1t+fn7+vt91ntddprpvmnbllyugkrymzmw u9a5dati9vr35exugrfztvy2/v//r5m5ial+zdbjcjjn8/jz+f73sv89ererel1vob2tuvw7orgx ymtu///+yyznkakgmdkur106iiid9/b5vwxnmbwoudy0j4+n+//9/v/8dw8pd5xnf3+inf8yjp2d frz2chb30ufzb3bt2+hy3e3wqkqiljcruw09q8+xlmowoxahmbii4+xnjr6p5o/n5/dkek9mqebe 8vf5//r/9ft4u5q9hcqglnkndh0fljsxa0uac1cjgl0kwazqwc69yn3k/f76drvuqn0iltkzejds lq+pv9hbn1ytv21fkb6bkb6kmlshtnc5t9y5dikehlz/w3blmeoddqvi4vfk////u8m4kgaaaqb0
i assume using python's email package? should handle images fine. if need decode image yourself, need have @ encoding, in case base64. there module encoding , decoding base64 in stdlib, too.
as mapping, content-id header images, create dict maps content ids mime parts. resolve urls in src, check if start 'cid:' (i.e. resolve internal mime document), strip off prefix , them in dictionary created before.
Comments
Post a Comment