google group 墙掉了,u997上去了下找到篇tesseract ocr 好文,转过来收藏好

Since the above wiki doucmentation is unclear to anyone not versed in Visual C++ incantations (if you're just a managed environment .NET developer like me you're basically screwed) I've fluffed it up a bit here. I took the following steps to compile tesseract with compressed/multipage TIFF support under a Windows 7 64 Bit system.

1. Download tesseract 2.04. Unpack it. In this example I've unpacked to C:\projects\tesseract-2.04. Windows 7 still doesn't understand .tar.gz out of the box. My recommendation is to get a copy of 7-Zip.

2. Download your required language files. I need german and english. I unpack these to the tessdata subdirectory of C:\projects\tesseract-2.04\tessdata.

3. Install libtiff. On my (64 bit) system the suggested install directory is C:\Program Files (x86)\GnuWin32?. Underneath this directory are a bunch of subdirectories containing files we'll need to compile tesseract with tiff support, namely include, bin and lib.

4. Add C:\Program Files (x86)\GnuWin32?\bin to your PATH environment variable so that the output tesseract.exe can find the libtiff dll. Restart.

5. Open the vc solution (tesseract.sln)

6. Change the solution configuration to "Release" mode. Note that if you later change back to Debug mode, you'll need to set up all the following again...

7. In the solution explorer right click the solution node (Solution 'tesseract') and click "Properties". Change to "Configuration Properties" and select "Release" configuration from the dropdown at the top of the window. Navigate to: Tools -> Options -> Projects and Solutions -> VC++ Directories Here we'll be adding the full paths for the subdirectories lib and include from the libtiff install so that VC can find the required header (.h) and static library (.lib) files. In this example they are: $(ProgramFiles?)\GnuWin32?\include $(ProgramFiles?)\GnuWin32?\lib as I'm using an environment variable. I could however just have written them as C:\Program Files (x86)\GnuWin32?\include.

Change the "Show Directories For" dropdown to "Include files". Add the following: $(ProgramFiles?)\GnuWin32?\include

Now change the "Show Directories For" dropdown to "Library files". Add the following: $(ProgramFiles?)\GnuWin32?\lib

8. Now open the project properties window for the tesseract project (not the solution). In the solution explorer right click the tesseract project and click properties. Navigate the horrendous list of options to Configuration Properties -> C/C++ -> Preprocessor and add HAVE_LIBTIFF to the list of Preprocessor Definitions. This causes a bunch of #includes to be enabled in the code.

9. You also want to add an "Additional dependancy". go to the "Additional dependancies" section for the project properties and add libtiff.lib.

10. Build the solution. Watch the error list. If you get a bunch of LNK2109 errors, that means the linker can't find something tesseract references. You're missing a reference to one of the paths from libtiff. If you get an error mentioning mt.exe, you've possibly encountered a bug in the sdk. Just try building again. see http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=106634 for more info.

If/when the solution builds successfully, you'll have a tesseract.exe file in the same directory as the tesseract solution file. drag you multipage compressed tiff here and try running tesseract. for example, if your tiff is called in.tif and you want to output text to out.txt, and the documents' language is german then your command line would look like:

tesseract.exe in.tif out -l deu The output file will have .txt appended to it by tesseract. If you're just translating english text then you can leave off the -l option, as tesseract assumes "eng" if you don't specify anything. If your tif file has the file extension .tiff, then tesseract will crap itself thusly:

C:\projects\tesseract-2.04>tesseract.exe in.tiff out -l deu Tesseract Open Source OCR Engine name_to_image_type:Error:Unrecognized image type:in.tiff IMAGE::read_header:Error:Can't read this image type:in.tiff tesseract.exe:Error:Read of file failed:in.tiff

Hopefully (fingers crossed, heh) you've now got an OCR'd out.txt file sitting in C:\projects\tesseract-2.04.

引用通告地址: 点击获取引用地址
评论: 177 | 引用: 0 | 阅读: 13522
chi flat irons [ 2011-06-23 17:47 | 回复 | 编辑 删除 ]
nike shox nz [ 回复于2011-07-06 17:27 网址 | 编辑 删除 ]
<a href="http://www.buybestnike.com/index.html">Cheap cheap Jordans</a>
ghgj [ 2011-06-24 10:44 | 回复 | 编辑 删除 ]
Easily one of the most bizarre parts of being a blogger is all the weird email we get. Whether it’s people demanding that we publish their press releases or give them free advertising space replica designer handbags, PR people trying to pitch us on stories that couldn’t have less to do with handbags and accessories or the one deranged person who wrote us a 10-paragraph email about how terrible Kim Kardashian is while pretending to be Kardashian herself discount designer handbags the things we receive never fail to entertain. For better or for worse.
Oakley Sunglasses [ 2011-06-30 17:18 网址 | 回复 | 编辑 删除 ]
burberry [ 2011-07-08 10:32 网址 | 回复 | 编辑 删除 ]
Nice!I like your articles very much,and also share our burberry outlet stores to you:Welcome to our burberry outlet store.in burberry outlet you can enjoy discount Burberry products,such as burebrry scaves,burbery suits,burberry coats and so on.burberry outlet online can give you best shopping experience.
karen millen [ 2011-07-08 10:34 网址 | 回复 | 编辑 删除 ]
Nice!I'm very glad to read your article,your article is so creative,I recommend my karen millen website for you and your lovely readers.karen millen dresses in sexy and graceful style.if you are interested in these dresses,you can buy them in karen millen outlet stores.karen millen sale is very high.especially the sale turnoover of karen millen online is considerable.karen millen uk is famous.
cheap Nike shox [ 2011-07-09 09:41 网址 | 回复 | 编辑 删除 ]
In the daily life, we will have the question frequently, why a man or a woman, will pause in different person's side time, will display that big difference. Actually the truth is very simple, each person loves ability and stimulates opposite party to love own ability is different, by no means in all loves or the so-called benign emotion, the human can obtain the happy heart.
发表评论
昵 称: 密 码:
网 址: 邮 箱:
选 项:    
头 像:
内 容: