OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.
It comes fully functional with prebuilt components, but also integrates third-party modules. Plugins for crawling content management systems, parsing special file formats, and performing text analytics are available.
Very minor bug fix release: some SQL Crawler problems fixed, better Javadoc, config option added to the web crawler.
Version 0.9 is out. We’ve made some architectural changes and added a WebCrawler. We’ve added a number of conveniences that people have asked for.
We expect to pick up the pace on OpenPipeline development soon. Expect a lot more, soon.
We’re getting there — 0.8 is now out. Big changes in this release: 1. The Item class now carries a binary version of the document, which can be useful for transmitting and saving it in a pipeline stage. It’s also useful for the second big change: 2. DocFilters are now a stage in the pipeline. The DocFilter interface has been refactored to look like a stage. This makes it much easier to handle documents that should generate multiple items. (Imagine an XML file with multiple subitems, or a large document that should have one item per chapter). It’s also a much cleaner design, because now connectors don’t need to know anything about DocFilters.
Plenty of bug fixes and small niceties added. Check it out.
We’ve added a wiki to the site. We’ve moved the documentation there and published a roadmap for OpenPipeline’s future. Take a look.
Version 0.7 is available on the download page. This is a minor bug-fix release. We’ve also added HTMLFilter and a feature or two to the FileScanner and the StageSelection modules. See the changelog for more.
A couple improvements in this new release: doc filters are now configurable, an OpenCalais stage has been added, there’s a beta ItemSender stage and ItemReceiver connector, and many small bug fixes.
We’ve posted a minor bug fix release to the downloads page. Build 1678 fixes a few NPEs, has a slightly different way of handling versions.
Version 0.5 is finally out. What’s new?
– Internally, we’ve done a complete refactoring. All the major objects have a better design. Connectors, Stages, and Items have stabilized.
– The UI has many new features and functions.
– There are new Connectors and Stages.
– It’s been in production use for a couple of months now and we’ve shaken out several bugs.
Get it on the download page.
Raritan Technologies has completed development of a Documentum connector for OpenPipeline. For sales or technical information contact Raritan Technologies