Friday, 31 May 2013

Enterprise Content Management: Convergence of Structured and Unstructured Data Management

Enterprises are handling increasing amounts of unstructured data (electronic data that are not stored in a predefined structure, like office documents, e-mail, web info), frequently kept in repositories which have structures of limited efficiency & accessibility. Moreover the internal structure of files is usually not standardised and may not be efficient, in terms of information retrieval and reusability. According to international studies, more than 85% of business data are of unstructured nature.
The advent of web content and the necessity to use proactively the web channel in the market, has further increased the need to efficiently manage information content of unstructured nature. The volume of information is rapidly increasing, thus becoming unmanageable (info glut). The increasing need to handle business information efficiently, in a highly competitive environment, has driven business efforts to improve ways of storing, retrieving, analyzing and reusing unstructured data. All relevant efforts aim to develop a meaningful structure which shall accommodate unstructured data. In other words to convert unstructured data to semi-structured data: data having a higher degree of structure than the former (not using a highly granular structure as data stored in fields of a relational database table, however not being stored in a loosely & ineffectively structured data repository).
Traditionally, techniques & technologies used to handle structured data (DBMS, SQL) were incompatible to those used to handle unstructured data (file servers, content management systems, collaboration tools). The term Business Intelligence stems from the structured world while the term Knowledge or Content management stems from the unstructured world. The combined retrieval & analysis of information (e.g. for a Customer) from both structured & unstructured data, has been traditionally carried out manually. However the term business intelligence does no longer refer exclusively to the structured data world. Convergence of structured & unstructured data technologies, is currently experienced. The introduction of a central data repository, can mitigate the negative effect caused by the development of information silos. This applies to both structured and unstructured data assets.
In order to develop a structure for handling unstructured data, an information model needs to be developed. This model has to accommodate the needs of different user groups: customers, info users, content authors, while being structured meaningfully: e.g. per product line, per business process. The use of DTDs (Document Type Definition) or XML schemas to structure content internally by introducing semantic tags, can enhance the capability to retrieve and reuse information hidden in documents. The use of sitemaps, meta tags and RSS feeds has being expanding on the Web, to describe the content of sites, especially on content which is frequently being updated (e.g. news content). RSS allows site syndication, an approach to share content on the web, thus increasing its accessibility & diffusion.