After reading the Herald, this morning , I fired some emails to a few ISPs, informing them that if they want to do some software development in this area for a plugins to their current system so that to improve the spam detection, then I am happy to offer my expertise to them for algorithm. I did give them a whole lot of references for computing papers which describe those algorithms for extraction of embedded text in a spam image. The extracted text then can be run using the normal 'bayesian' spam filter. Text recognition is not yet available here in NZ for any ISP, to the best of my knowledge. From the Herald's description this morning, the majority of those spams were embedded texts in images.
=========== Text Recognition Algorithms ============
"Embedded-Text Detection and Its Application to Anti-Spam Filtering" http://lbmedia.ece.ucsb.edu/resources/ref/thesis.pdf
"Text identification in complex background using SVM" (available in PS & PDF format) http://citeseer.ist.psu.edu/556015.html
"A Survey of Text Detection and Recognition in Images and Videos" (available in PS format) ftp://ftp.idiap.ch/pub/reports/2000/rr-00-38.ps.gz
"Detection of Text on Road Signs From Video" http://www.informedia.cs.cmu.edu/documents/wu_its05.pdf
"Extraction and Recognition of Artificial Text in Multimedia Documents" (available in PS & PDF format) http://citeseer.ist.psu.edu/503217.html
"Model based text detection in images" http://citeseer.ist.psu.edu/724755.html
"Text Information Extraction in Images: A Review" http://www.cse.msu.edu/prip/Files/TextDetectionSurvey.pdf
"Finding text in images" (available in PS & PDF format) http://citeseer.ist.psu.edu/wu97finding.html
"Extraction of Text from Images" (available in PS & PDF format) http://citeseer.ist.psu.edu/nath02extraction.html
"TextFinder: An automatic system to detect and recognize text in images" http://citeseer.ist.psu.edu/wu97textfinder.html
"Word extraction using irregular pyramid" http://www.comp.nus.edu.sg/labs/chime/da/paperdownload/loo01spie.pdf
"Text extraction from gray scale document images using edge information" http://www.comp.nus.edu.sg/~tancl/Papers/Earlier%20Papers/yuan01icdar.pdf
"Separation of overlapping text from graphics" http://www.comp.nus.edu.sg/labs/chime/da/paperdownload/cao01icdar.pdf
"Localizing and segmenting text in images and videos" http://www.informatik.uni-mannheim.de/pi4/publications/Lienhart2002a.pdf
"Word Image Retrieval Using Binary Features" http://www.cedar.buffalo.edu/~srihari/papers/SPIE2004.pdf
"Automatic identification and skew estimation of text lines in real scene images" http://tev.itc.it/people/messelod/Papers/TextLocalization.pdf
"A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods" http://lts1pc19.epfl.ch/repository/Chen2004_776.pdf
"Human-Perception Handwritten Character Recognition using Wavelets" http://citeseer.ist.psu.edu/539423.html
"Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems" http://www.dsi.unifi.it/NNLDAR/Papers/01-NNLDAR05-Liu.pdf
"Self-supervised adaptation for on-line script text recognition" http://elcvia.cvc.uab.es/public/articles/0502/a2004042-4-art.pdf
"Online and Offline Character Recognition Using Alignment to Prototypes" http://cs-people.bu.edu/athitsos/publications/alon_icdar2005.pdf
"Character Extraction from Documents using Wavelet" http://citeseer.ifi.unizh.ch/hwang98character.html
"Machine Printed Text and Handwriting Identification in Noisy Document Images" http://citeseer.ist.psu.edu/zheng04machine.html
We welcome thoughtful disagreement. But we do (ir)regularly moderate comments -- and we *will* delete any with insulting or abusive language. Or if they're just inane. It’s okay to disagree, but pretend you’re having a drink in the living room with the person you’re disagreeing with. This includes me. PS: Have the honesty and courage to use your real name. That gives added weight to any opinion.
After reading the Herald, this morning , I fired some emails to a few ISPs, informing them that if they want to do some software development in this area for a plugins to their current system so that to improve the spam detection, then I am happy to offer my expertise to them for algorithm. I did give them a whole lot of references for computing papers which describe those algorithms for extraction of embedded text in a spam image. The extracted text then can be run using the normal 'bayesian' spam filter. Text recognition is not yet available here in NZ for any ISP, to the best of my knowledge. From the Herald's description this morning, the majority of those spams were embedded texts in images.
ReplyDelete=========== Text Recognition Algorithms ============
"Embedded-Text Detection and Its Application to Anti-Spam Filtering"
http://lbmedia.ece.ucsb.edu/resources/ref/thesis.pdf
"Text identification in complex background using SVM" (available in PS & PDF format)
http://citeseer.ist.psu.edu/556015.html
"A Survey of Text Detection and Recognition in Images and Videos" (available in PS format)
ftp://ftp.idiap.ch/pub/reports/2000/rr-00-38.ps.gz
"Detection of Text on Road Signs From Video"
http://www.informedia.cs.cmu.edu/documents/wu_its05.pdf
"Extraction and Recognition of Artificial Text in Multimedia Documents" (available in PS & PDF format)
http://citeseer.ist.psu.edu/503217.html
"Model based text detection in images"
http://citeseer.ist.psu.edu/724755.html
"Text Information Extraction in Images: A Review"
http://www.cse.msu.edu/prip/Files/TextDetectionSurvey.pdf
"Finding text in images" (available in PS & PDF format)
http://citeseer.ist.psu.edu/wu97finding.html
"Extraction of Text from Images" (available in PS & PDF format)
http://citeseer.ist.psu.edu/nath02extraction.html
"TextFinder: An automatic system to detect and recognize text in images"
http://citeseer.ist.psu.edu/wu97textfinder.html
"Word extraction using irregular pyramid"
http://www.comp.nus.edu.sg/labs/chime/da/paperdownload/loo01spie.pdf
"Text extraction from gray scale document images using edge information"
http://www.comp.nus.edu.sg/~tancl/Papers/Earlier%20Papers/yuan01icdar.pdf
"Separation of overlapping text from graphics"
http://www.comp.nus.edu.sg/labs/chime/da/paperdownload/cao01icdar.pdf
"Localizing and segmenting text in images and videos"
http://www.informatik.uni-mannheim.de/pi4/publications/Lienhart2002a.pdf
"Word Image Retrieval Using Binary Features"
http://www.cedar.buffalo.edu/~srihari/papers/SPIE2004.pdf
"Automatic identification and skew estimation of text lines
in real scene images"
http://tev.itc.it/people/messelod/Papers/TextLocalization.pdf
"A localization/verification scheme for finding text in images
and video frames based on contrast independent features and
machine learning methods"
http://lts1pc19.epfl.ch/repository/Chen2004_776.pdf
"Human-Perception Handwritten Character Recognition using Wavelets"
http://citeseer.ist.psu.edu/539423.html
"Classification and Learning for Character Recognition: Comparison of
Methods and Remaining Problems"
http://www.dsi.unifi.it/NNLDAR/Papers/01-NNLDAR05-Liu.pdf
"Self-supervised adaptation for on-line script text recognition"
http://elcvia.cvc.uab.es/public/articles/0502/a2004042-4-art.pdf
"Online and Offline Character Recognition Using Alignment to Prototypes"
http://cs-people.bu.edu/athitsos/publications/alon_icdar2005.pdf
"Character Extraction from Documents using Wavelet"
http://citeseer.ifi.unizh.ch/hwang98character.html
"Machine Printed Text and Handwriting Identification in Noisy Document Images"
http://citeseer.ist.psu.edu/zheng04machine.html