google group 墙掉了,u997上去了下找到篇tesseract ocr 好文,转过来收藏好
作者: 火车头 日期: 2010-07-28 15:52
Since the above wiki doucmentation is unclear to anyone not versed in Visual C++ incantations (if you're just a managed environment .NET developer like me you're basically screwed) I've fluffed it up a bit here. I took the following steps to compile tesseract with compressed/multipage TIFF support under a Windows 7 64 Bit system.
1. Download tesseract 2.04. Unpack it. In this example I've unpacked to C:\projects\tesseract-2.04. Windows 7 still doesn't understand .tar.gz out of the box. My recommendation is to get a copy of 7-Zip.
2. Download your required language files. I need german and english. I unpack these to the tessdata subdirectory of C:\projects\tesseract-2.04\tessdata.
3. Install libtiff. On my (64 bit) system the suggested install directory is C:\Program Files (x86)\GnuWin32?. Underneath this directory are a bunch of subdirectories containing files we'll need to compile tesseract with tiff support, namely include, bin and lib.
4. Add C:\Program Files (x86)\GnuWin32?\bin to your PATH environment variable so that the output tesseract.exe can find the libtiff dll. Restart.
5. Open the vc solution (tesseract.sln)
6. Change the solution configuration to "Release" mode. Note that if you later change back to Debug mode, you'll need to set up all the following again...
7. In the solution explorer right click the solution node (Solution 'tesseract') and click "Properties". Change to "Configuration Properties" and select "Release" configuration from the dropdown at the top of the window. Navigate to: Tools -> Options -> Projects and Solutions -> VC++ Directories Here we'll be adding the full paths for the subdirectories lib and include from the libtiff install so that VC can find the required header (.h) and static library (.lib) files. In this example they are: $(ProgramFiles?)\GnuWin32?\include $(ProgramFiles?)\GnuWin32?\lib as I'm using an environment variable. I could however just have written them as C:\Program Files (x86)\GnuWin32?\include.
Change the "Show Directories For" dropdown to "Include files". Add the following: $(ProgramFiles?)\GnuWin32?\include
Now change the "Show Directories For" dropdown to "Library files". Add the following: $(ProgramFiles?)\GnuWin32?\lib
8. Now open the project properties window for the tesseract project (not the solution). In the solution explorer right click the tesseract project and click properties. Navigate the horrendous list of options to Configuration Properties -> C/C++ -> Preprocessor and add HAVE_LIBTIFF to the list of Preprocessor Definitions. This causes a bunch of #includes to be enabled in the code.
9. You also want to add an "Additional dependancy". go to the "Additional dependancies" section for the project properties and add libtiff.lib.
10. Build the solution. Watch the error list. If you get a bunch of LNK2109 errors, that means the linker can't find something tesseract references. You're missing a reference to one of the paths from libtiff. If you get an error mentioning mt.exe, you've possibly encountered a bug in the sdk. Just try building again. see http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=106634 for more info.
If/when the solution builds successfully, you'll have a tesseract.exe file in the same directory as the tesseract solution file. drag you multipage compressed tiff here and try running tesseract. for example, if your tiff is called in.tif and you want to output text to out.txt, and the documents' language is german then your command line would look like:
tesseract.exe in.tif out -l deu The output file will have .txt appended to it by tesseract. If you're just translating english text then you can leave off the -l option, as tesseract assumes "eng" if you don't specify anything. If your tif file has the file extension .tiff, then tesseract will crap itself thusly:
C:\projects\tesseract-2.04>tesseract.exe in.tiff out -l deu Tesseract Open Source OCR Engine name_to_image_type:Error:Unrecognized image type:in.tiff IMAGE::read_header:Error:Can't read this image type:in.tiff tesseract.exe:Error:Read of file failed:in.tiff
Hopefully (fingers crossed, heh) you've now got an OCR'd out.txt file sitting in C:\projects\tesseract-2.04.
| Mario Manningham Super Bowl Jersey, Mario Manningham 2012 Super Bowl, Hakeem Nicks 2012 Super Bowl, Brandon Jacobs Super Bowl Giants, Victor Cruz Authentic Jersey Super Bowl, |
| "A methodology may Deion Branch Jersey be rigid or Hakeem Nicks Jersey flexible with Eli Manning Jersey opportunity |
| The North Face Sale Jacket has a lightweight leather exterior with perforated torso making the jacket breathable and a belted waist.On the front there are two chest flap pockets and two hip flap pockets with press studs fastenings. The North Face Clearance Jacket can make you more charming and chic in some imporstant occasions.I you feel interested in it, do not hesitate to place an order on our site. The North Face Jacket Sale a lots of discount for you. |
订阅
上一篇
返回
下一篇