Posted
on January 8, 2009, 10:11 am,
by Chris Cleveland,
under
News.
OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.
It comes fully functional with prebuilt components, but also integrates third-party modules. Plugins for crawling content management systems, parsing special file formats, and performing text analytics are available.
Comments Off
Posted
on November 15, 2010, 8:30 pm,
by Chris Cleveland,
under
News.
Very minor bug fix release: some SQL Crawler problems fixed, better Javadoc, config option added to the web crawler.
Posted
on October 29, 2010, 6:10 pm,
by Chris Cleveland,
under
News.
Version 0.9 is out. We’ve made some architectural changes and added a WebCrawler. We’ve added a number of conveniences that people have asked for.
We expect to pick up the pace on OpenPipeline development soon. Expect a lot more, soon.
Posted
on December 14, 2009, 8:47 pm,
by Chris Cleveland,
under
News.
We’re getting there — 0.8 is now out. Big changes in this release: 1. The Item class now carries a binary version of the document, which can be useful for transmitting and saving it in a pipeline stage. It’s also useful for the second big change: 2. DocFilters are now a stage in the pipeline. The DocFilter interface has been refactored to look like a stage. This makes it much easier to handle documents that should generate multiple items. (Imagine an XML file with multiple subitems, or a large document that should have one item per chapter). It’s also a much cleaner design, because now connectors don’t need to know anything about DocFilters.
Plenty of bug fixes and small niceties added. Check it out.
Posted
on August 17, 2009, 12:48 pm,
by Chris Cleveland,
under
News.
We’ve added a wiki to the site. We’ve moved the documentation there and published a roadmap for OpenPipeline’s future. Take a look.
Posted
on July 27, 2009, 2:49 pm,
by Chris Cleveland,
under
News.
Version 0.7 is available on the download page. This is a minor bug-fix release. We’ve also added HTMLFilter and a feature or two to the FileScanner and the StageSelection modules. See the changelog for more.
Posted
on June 16, 2009, 10:27 am,
by Chris Cleveland,
under
News.
A couple improvements in this new release: doc filters are now configurable, an OpenCalais stage has been added, there’s a beta ItemSender stage and ItemReceiver connector, and many small bug fixes.
Posted
on March 31, 2009, 12:34 pm,
by Chris Cleveland,
under
News.
We’ve posted a minor bug fix release to the downloads page. Build 1678 fixes a few NPEs, has a slightly different way of handling versions.
Posted
on March 2, 2009, 5:29 pm,
by Chris Cleveland,
under
News.
Version 0.5 is finally out. What’s new?
– Internally, we’ve done a complete refactoring. All the major objects have a better design. Connectors, Stages, and Items have stabilized.
– The UI has many new features and functions.
– There are new Connectors and Stages.
– It’s been in production use for a couple of months now and we’ve shaken out several bugs.
Get it on the download page.
Raritan Technologies has completed development of a Documentum connector for OpenPipeline. For sales or technical information contact Raritan Technologies