DasBlog: Notes from Building a Distributed .NET Collaboration System

 

Clemens Vasters
newtelligence AG

January 2004

Summary: Expounds the merits of weblogs as a means of sharing knowledge, and describes the lessons learned from designing and implementing a weblog engine built using Microsoft .NET technologies. (16 printed pages)

Contents

Part 1: The Weblog Phenomenon
Weblogs in the Software Business
Weblogs as an Enterprise Communication and Collaboration Tool
Part 2: Weblog Technologies and Applications
RSS Publishing and Aggregating Information via XML
Referrals Sparking Discussion and Interaction
dasBlog: Implementing a Weblog Engine
dasBlog: Requirements, Considerations, and Solutions
Summary
Resources

Keeping a diary seems mostly an obsession of teenage girls, politicians, and people who are planning to write their memoirs at a later point in life. A diary is the place to write down a daily or weekly snapshot of very private thoughts that are nobody else's business. No matter whether the diary is kept by a 15 year old girl who is desperately in love with rock star Robbie Williams, or by the German chancellor, all diaries have one thing in common—they are top secret, locked away and never shared with anyone.

Looking at the newest trend on the Web, we might need to revise, or at least widen, our definition of a diary. weblogs (usually shortened to 'Blogs') are the digital equivalent of the personal diary, and have taken the Internet by storm over the past two years. Mid-2003 estimates by blogcensus.net and blogcount.com put the total number of weblogs somewhere between 1.3 and 2.2 million, and these numbers are rapidly growing. The two most striking differences between diaries and weblogs are that weblogs are not only for teenage girls or politicians, and that they are not secret at all. On the contrary; weblogs are out there for everyone to see!

So why is this topic being discussed in a magazine for software architects and information technology managers? There are two main reasons: First, there are a lot of architectural lessons that can be learned from the weblog phenomenon and from the technologies that make the weblog universe tick. In fact, the weblog space as a whole has already grown to be the largest distributed XML and Web services application in existence. Second, weblogs are becoming a strategic tool to improve communication and collaboration in the enterprise that may eventually turn out to be just as important as email.

In the first part of this article, I will analyze and explain the weblog phenomenon and give some examples of how weblogs can be used as collaboration systems in business environments. In the second part I will take a closer look at weblog related technologies, examine a concrete example of a weblog Engine in the form of newtelligence's free Microsoft® ASP.NET-based dasBlog application, and share some of the architectural lessons newtelligence learned from implementing, deploying and running weblogs on this platform.

Part 1: The Weblog Phenomenon

Before we go into the more technical aspects, we should spend a bit of time analyzing why weblogs have become so wildly popular and why some people are publishing their once secret diary on the Internet, where it can potentially be read by anyone.

Keeping a diary, and therefore keeping a weblog, is a very good idea, indeed. Formulating thoughts and ideas in writing requires more serious and intense consideration of a topic than just thinking about it. Keeping track of thoughts, ideas, problems, and solutions over time helps build a great growing resource of personal experience that you can turn back to in order to look up all of those details that were once well thought out and known but now almost forgotten.

However, being a good idea is not enough of a reason to explain the sudden popularity of weblogs. What makes weblogs really popular is that they provide a near-perfect personal publishing approach for everybody. Even with the most recent generation of HTML editing tools, maintaining a private Web site is far too complicated for most people and the result is often disappointing when compared to the polished Web sites created by professional web-designers. Even if the technical design challenges can be overcome there is often a 'content dilemma' regarding the information that should actually be included in the home page. Unless the owner is a fanatic hobbyist keen to share his expertise about postage stamps, model railways or his favourite movie star, the home page creation efforts usually result in little more than a few pictures, a personal greeting such as 'Welcome to my home on the Web' and some favourite links—with the frustrating consequence that nobody will ever look at it.

weblogs—or more precisely, the tools that are used to create weblogs—help overcome these hurdles to personal publishing. Most popular weblog tools are in fact quite sophisticated personal content management systems that will take care of rendering content into a professional looking HTML template. They usually provide content categorization as well as topical and historical navigation capabilities. This functionality, together with their ease of use also seems to solve the content dilemma. If the effort to publish a quick thought of no more than three lines onto a Web site isn't greater than that of writing an email, then it's more likely that it will happen. The ease of publishing and the ability to publish random thoughts quickly, results in a qualitative difference between home pages and weblogs: Home pages help show off HTML skills or knowledge and admiration for a particular subject; weblogs exhibit personality and tell an ongoing story.

Moreover, weblogs have turned into a public discussion platform where thoughts and ideas can be exchanged. Because weblogs are formatted as hypertext, they allow links to other weblogs, enabling weblog authors to publicly comment on other authors' entries, and to any other Web-pages the authors want to highlight to their readers. Many weblog tools also facilitate the collection of per-topic reader comments, allowing everybody, not only 'Bloggers', to participate in discussions.

weblog tools not only expose the content as Web sites, but also in XML. The most commonly used format is the Real Simple Syndication (RSS) 2.0 format (also referred to as RDF Site Summary or Rich Site Summary).

XML and RSS allow weblog readers using special tools to consolidate all content of interest into a local content store that's searchable, easily navigable, and provides pre-categorized views that are also available offline. The NewsGator plug-in for Microsoft Outlook® even goes so far as to integrate this ability into your everyday email client.

Weblogs in the Software Business

Recently, weblogs have moved very much beyond being just a trendy tool for personal publishing and community discussion. Corporate knowledge and information workers already benefit greatly from weblogs as an information source today. Software developers and architects as well as sales and marketing employees quite often find weblog links amongst the top 10 entries of a Web search result—and sometimes weblog entries seem to be the only good source for certain information.

Technology companies like Microsoft and Sun even have dedicated portals where their employees can host their weblogs. However, while the Microsoft portal is quite liberal about weblogs and only provides a set of links pointing to their employee's personal blogs reflecting their personal opinions and not the 'official' Redmond position, the Sun portal and weblogs are much more of a developer marketing and developer education tool with a personal touch. Both approaches have upsides and downsides and neither can be called better or worse. Sun's primary interest is a consistent, polished external corporate message, while Microsoft's approach is to get and retain traction with customers on an informal, personal level and give them an uncensored insight about what's happening behind the scenes.

Both the Microsoft and Sun approaches to public corporate weblogs fill a gap in corporate messaging. The weblog format, the personal nature, and the ease of publishing allows authors to post small, informative snippets about a product or solution path and, as demonstrated by the very liberal Microsoft approach, sometimes even without going through the usual corporate review, editing and publishing cycle. It's obvious that even companies as large as Microsoft cannot document every problem solution and workaround using the 'official channels', with all of the requirements around localization, consistency and publication processes—letting employees augment the official documentation with their own insight is a brilliant way to complete the picture and fill those gaps.

The corporate use of weblogs is most visible in the software industry, but it's not restricted to it. There are signs that weblogs are starting to fulfil similar roles in other fields of business such as law, the media industry, the fashion industry and various fields of engineering. weblogs aid in bringing a more personal touch to the corporate message and are a very valuable marketing and public relations tool that demonstrates the personal competence and abilities of the people that are the drivers behind the a company.

Weblogs as an Enterprise Communication and Collaboration Tool

An even more attractive application of weblogs for corporations is information distribution and collaboration inside the corporate network. Although many enterprises have embraced and implemented portal solutions and 'groupware' as information distribution and discussion platforms for a long time now, weblogs have the potential to substantially enhance the corporate information ecosystem and foster a more transparent corporate culture.

Let's look at a few examples of how weblogs and related technologies can be used in a corporate environment and that use differs from more established solutions:

  • Personal weblogs. Personal weblogs are the most obvious application, but—depending on the corporate culture—also have the most potential for conflict. Personal weblogs are a 'personal portal' and allow individuals to track the results of their own work and information they gather from third party information sources such as external websites. Due to its chronological nature, it also provides a great way of document the history of decision processes, of assumptions about future events and therefore provides a great foundation for post-mortem analysis in the event of success or, more importantly, failure. The conflict potential lies in issues like content ownership, supervision, freedom of personal expression, disciplinary consequences and similar issues.
  • Topic-Centric Knowledge weblogs (K-Logs). Knowledge Logs are not person-centric but topic-centric. They are based on the same technology as weblogs, but have multiple authors, usually from one team, but sometimes across teams or even divisions. K-Logs focus on certain subject areas and enable the aggregation of information and references to topical content as a growing and chronological repository. In this function K-Logs have substantial overlap with classic knowledge portal solutions. What makes K-Logs appealing is the relatively low software acquisition cost, the ease of use and the ability to distribute and aggregate the content via RSS. These advantages will be examined in detail later in this article.
  • Team weblogs. Team weblogs are team centric and track the progress of the development of a certain product or project.
  • Automated weblogs. Using features like 'Mail-To-weblog' (discussed later in this article) or integrating with Web services exposed by weblog engine software, weblogs can serve as an easy-to-install and easy-to-maintain publishing point for automatically generated information. In the software industry this information could include daily reports generated from automated build and test processes, in manufacturing processes it could be statistical information, and so on. The benefit of automated weblogs is that the effort for publishing such information in an accessible way is minimized and it is sufficient for the information provider to supply very simple plain-text fragments.

An analysis of your own corporate knowledge capturing and distribution needs might yield more possible applications of weblogs. In addition to this, subscription-based RSS information services can help to improve information distribution inside the company without flooding email inboxes. Examples of this are automatically generated daily or hourly reports about system or machine activities, but also mundane yet important things such as today's menu in the company's cantina or the latest scores of the company football team.

Part 2: Weblog Technologies and Applications

The 'Blogosphere', as the weblog space is also often called by weblog aficionados, is powered by a set of core technologies and techniques that we need to explain before we can move on to the details of the concrete implementation in newtelligence's dasBlog.

RSS Publishing and Aggregating Information via XML

The most important weblog technology is undoubtedly the Real Simple Syndication) (RSS) XML format that has already been mentioned earlier in this article. RSS was initially created by Userland Software and Netscape as the XML format behind the 'Sidebar' feature of Netscape Navigator 6.0 that was the follow-up technology to the 'Netcaster' in the 4.0 generation of Netscape's product. RSS can be seen as a response to Microsoft's XML-based Channel Definition Format (CDF) for 'Active Channels' and the 'Active Desktop' that had already been introduced in Microsoft® Internet Explorer's version 4.0, and both serve approximately the same purpose:

The common idea behind the RSS and CDF was to provide machine readable indices for websites that could be picked up by Netscape Navigator and Internet Explorer and allowed the browsers to display the current site highlights and headlines either in the Netscape Sidebar, in Internet Explorer's Favourites View or on the Microsoft® Windows® 'Active Desktop'. The vision and promise was that all news headlines and articles that a user was interested in could be imported in a quick online session and were then available offline—primarily as a convenient workaround to expensive, pay-per-minute Internet access cost. The long defunct Pointcast Network had a similar approach and was also (in part) using the CDF format to acquire information from its sources and to push subscribed channels to their client software. Unfortunately, none of these products and features were blessed with any great success; Pointcast went floating belly up and 'channels' built for either Netscape or Internet Explorer have become an extinct species. RSS survived thanks to the continued efforts of Userland Software and some other vendors focusing on small scale content management systems, from which today's weblog tools eventually evolved.

Despite its popularity and the fact that RSS has accumulated a critical mass of adoption that makes it hard to replace, it is widely recognized that RSS has several critical deficiencies requiring changes or additions. RSS lacks proper support for XML Namespaces, does not use the ISO time format mandated by the XML specification, has no normative XML Schema or even Document Type Definition, and the specification itself is ambiguous and lacks formality. These issues have prompted the formation of a working group around the IBM engineer Sam Ruby who are working to replace RSS, along with a consolidation of most of weblog technologies into a set of specifications under the name Atom.

Referrals Sparking Discussion and Interaction

The public interaction between Bloggers that emerges in the Blogosphere is one of its greatest appeals and motivators. Discussions on certain topics quite often involve dozens of authors who independently publish their own views on their weblogs, but by citing other authors and linking to their respective weblogs they jointly create a hyperlinked mesh of views, information and opinions on a given topic.

Discussions spanning multiple weblogs are formed by no rules other than chaos. This is distinctively different from discussions in Internet newsgroups, on mailing-lists, or in public folders in a Groupware system. In these, discussion participants must active subscribers of a certain group and the discussion usually goes unnoticed outside of the group. Google's newsgroup archive exposes newsgroup discussions to the web to some degree, but still requires the user to explicitly search either in a particular newsgroup or for certain keywords.

The primary tool aiding the chaotic formation of discussion and interaction is a simple and well-known mechanism supported by all common Web browsers: The HTTP Referrer header. When someone comes across an interesting tip, a thought provoking article or an opinion they post their applause, concerns or supporting information to the own weblog—and in doing so add a hyperlink to the cited weblog post over on the other weblog. Because almost all weblog tools recognize and log the Referrer HTTP header, notifying the author of the original entry is as simple as clicking the link in a browser, because the Referrer header contains the URL of the page where the hyperlink was set. Most popular weblog engines track and consolidate the referrals in easily accessible lists, ranking them by the number of visitors that have arrived at the weblog through the external links and rendering the referrer URL as a clickable hyperlink—some weblog engines can even notify their owner by email of every such referral. In that way, an author learns about external comments or citations and can post his own responses or additional comments—and in the process linking back and possibly citing and linking weblogs of third parties for examples or supporting opinions, causing the on-topic mesh to form and spread.

Because hyperlink referrals still require manual intervention in order to trigger notification, two additional and more instant notification mechanisms have gained the support of the weblog community and tool builders: Pingback and Trackback.

Pingback allows implementing automatic notifications between weblog engines without having to rely on HTTP referrals. Pingback defines a Web service interface (using the XML-RPC Web services protocol, not its successor SOAP) and two auto-discovery mechanisms. The function principle is very simple: When the weblog author posts a new entry to their weblog, the engine looks at the submitted HTML fragment and scans it for hyperlinks. It will then issue an HTTP GET request to each of those links, using one or both of the auto-discovery mechanisms, looking for an HTTP header or a special tag embedded in HTML, in order to find out whether the link target supports the Pingback protocol. If a Pingback endpoint is detected, the engine will submit a ping Web service call, supplying the URLs of both, the pinged and the pinging weblog entry. Pingback has the advantage of instant notification about citations and, just as important, about changes to these citations.

Trackback aims to provide similar functionality, but with a slightly different spin. The protocol does not only provide the URL of the pinging entry, but optionally also the title and a short excerpt of the source entry along with the weblog's name. Contrary to Pingback, which is fully automatic, Trackback is typically used as an explicit, on-demand notification mechanism.

The major technical difference between Pingback and Trackback is that Pingback employs an XML-RPC Web service interface while a Trackback ping is technically equivalent to submitting a form in a browser—the information is posted using a HTTP POST request employing the application/x-www-form-urlencoded content type, precisely as it is the case with HTML forms. Although different, both protocols succeed in achieving their goal: improving collaboration and communication.

To enable people who do not own their own weblog to participate in weblog discussions, most weblog engines support user comments that can be added using a Web-based interface. Additionally, a widely adopted Web services interface for comments exists; the Comment API. The Comment API is directly supported by some of the popular RSS aggregators like RSS Bandit, which allow readers to post comments straight from the tool.

dasBlog: Implementing a Weblog Engine

Early in 2003, at newtelligence we decided that building our own weblog engine would be a good thing to do. There were several motivations. As the most active and likely best known weblog author at newtelligence, I primarily wanted to have a replacement for my previous weblog tool for myself, with the side effect that the other colleagues at newtelligence could use it too. Writing our own blog engine also promised to give us a great set of example code to use for developer education and a platform to try out new technologies and techniques. Finally, developing a solution that supports all the described and a few more collaboration Web services seemed like a great experiment to participate in, and allowed us to research the reality of a distributed system that already implements a great deal of the Web services vision.

At the time I got around starting to implement dasBlog in July 2003, with just 5 calendar weeks allocated to complete the job, there were two major weblog engines existing for the Microsoft® .NET Framework, our default platform: The engine powering https://weblogs.asp.net (now called '.Text'), for which code was not available at the time, and the engine BlogX, which had been thrown together by a couple of people at Microsoft in their spare time along with some community contributors.

The 'make or take' decision was a relatively easy one, because BlogX was already a working implementation with a file-based backend store and was relatively lightweight in terms of existing features providing a good skeleton that made refactoring and adding new features relatively easy. Also, the license conditions for BlogX were largely equivalent to the BSD license, which we also favour for work that we publish for free and in source code form.

While our initial intent was to merge our changes back into the original BlogX code base, it turned out that the refactoring process led to the elimination of almost all of the original BlogX source code as the project progressed. Because merging the result into the code base would have amounted to a hostile takeover of that community project, we've decided to give it a new name and to maintain it as a separate project, and so dasBlog was born. Version 1.0 of the new code base went public after 3 weeks, with the follow-up versions leading up to the complete feature set in version 1.2 being released after 5 weeks—in time and with a stability and quality that has convinced several dozen bloggers to abandon their old tools and switch to the dasBlog engine even in the early stages of the project.

dasBlog: Requirements, Considerations, and Solutions

Fundamentally, dasBlog is a small content management system that's directly bound to a rendering engine, which renders all content just in time and based on the view that a visitor chooses.

Storage

The primary task of a weblog system is to capture and present a chronology of events. The front page of a weblog therefore presents a configurable number of weblog entries, chronologically ordered, with the most recent entries at the top. This fundamental principle also requires that visitors can easily navigate through the history of the weblog by date. This primary function immediately influences the design of the backend store, for which the most common lookup criteria is a date or a time span. At the same time, it must be possible to efficiently access individual weblog entries by their identifier in order to attach 'tracking' information such as referrals, pingbacks and trackbacks, and to associate and display comments as explained earlier in this article.

Because a weblog is a person-centric, not topic-centric publishing point, it is also required to enable and ease by-topic navigation by introducing a categorization scheme. Creating and maintaining categories should be largely automated and should not require much administrative effort on behalf of the user.

Fulfilling these requirements would be very easy with a relational database system and a few simple indexes. However, to achieve the desired ease of initial deployment and future upgrades coupled with minimal administrative effort for anyone with a low-cost, ASP.NET enabled account at a web-hosting company as well as for users on corporate desktop machines running a local web server as their own publishing point, it's not a good idea to depend on the existence of a full database system. Instead, the backend is factored in a way that a database could be supported if that requirement should arise, but the best and primary storage mechanism is quite simply the file system. This could have been achieved by using a file-based database like Microsoft's well-known Jet or FoxPro engines, but such a built-in dependency would limit extensibility and impact the efforts required for upgrading to newer versions of the software as they appear. Once the road down the database route is taken, any changes or additions to the storage require schema updates for databases in the installed base, substantially increasing the administrative effort.

The resulting architectural decision that was already pre-defined by the original BlogX code base, and which we consequently decided to stick with, was to store all information in XML files in a subdirectory of the application. Because the lookup criteria is based on time, or at least a time interval, the content is stored following a 'one day, one file' scheme and the index is simply the file system's directory information: The files names contain the date. The auxiliary indexes for the categories and the entry's unique identifiers are stored in separate files. Additional information such as tracking data (referrals, pingbacks, and trackbacks) and comments are stored in files that are also named (and thus indexed) by date, but are kept separate from the actual content in order to limit concurrency issues and to address the differences in their characteristics:

The core content has a very low update frequency (a few times a day), has very many reads, and must never be lost. Tracking information is updated very often, potentially concurrently, has many reads, and is less critical. Whenever changes to the core content occur, the engine persists them synchronously to be able to report any errors straight back to the user. These changes also cause all in-memory caches to be discarded and the auxiliary indexes to be rebuilt. All trackings, however, are processed asynchronously on a single secondary thread that is sequentially fed information through an in-memory queue. This can be done, because the timeliness requirements for trackings are very relaxed: They need to be reflected in the weblog eventually, but they don't need to appear as the event occurs.

Because a new file is created each day, the resulting file sizes are quite small (typically substantially less than 100KB), and writes are quick with minimal locking. An aggressive approach to caching allows synchronous updates to the in-memory caches and the backend store especially for the tracking information, and therefore further reduces concurrency problems. It should be noted, however, that the chosen storage model and the interaction between the in-memory caches and the backend store limits clustering or otherwise having multiple engines share a common store. That's a deliberate and acceptable restriction, because even very popular weblogs usually get only a few ten thousand hits per day. This is aided by the fact that most users read weblogs though RSS aggregators and due to infrequent updates of the core content, the RSS streams can be easily cached on the server side and even by upstream proxies.

Content Management

While we have already discussed the storage strategy, we haven't yet covered how content is actually submitted to the engine. Here again, dasBlog had to fulfil multiple requirements for a variety of different usage scenarios.

The most obvious way to submit content into a Web-based application is to use a form on a Web page. dasBlog supports this for all browsers, but gives Microsoft's Internet Explorer preferred treatment for a few quite simple reasons: Users that access the editor web pages using Microsoft Internet Explorer are provided with a page that includes a set of client scripts and an inline frame, utilizing Internet Explorer's inherent ability to act as an HTML editor. With this, users get rich, in-browser text editing capabilities and can style text using several fonts, typographic effects and colours. The editor also supports attaching files and embedding pictures. The binaries are uploaded using the standard HTML upload control and stored in a special directory below or alongside the content directory. Once the upload is complete, the picture or a hyperlink is inserted into the current text.

For users with other browsers, such as Opera or the Mozilla browser, the web-based editing capabilities are unfortunately much more restricted. With these browsers, users only get a standard multi-line text field and must write HTML mark up explicitly. The decision to go with such a limited version for non-Microsoft browsers is based on Internet Explorer's market share, the assumption that the users of dasBlog will run Windows on their desktops, and the fact that HTML editing support is not standardized across browsers. However, this limitation isn't as significant as it might appear at first sight, because using the web form is only one of three ways to submit content into the engine.

The first alternative to submitting content through a browser is to use one of a variety of offline weblog editors such as Zempt or w.bloggar that directly support the Web services API developed by the makers of the popular weblog environment 'Blogger'. Any editor that can target Blogger servers can also target dasBlog. dasBlog implements the Blogger API along with extensions made for the competing MovableType weblog software and extensions made by Userland, so that a wide range of tools can be used both to submit new entries and to edit existing entries from a rich client.

The Blogger API and its various extensions define Web services endpoints that do not use the SOAP protocol, but rather use XML-RPC. The Atom working group that has already been mentioned in the previous discussion on RSS is planning to consolidate the partially overlapping functionality of what are essentially three APIs and define the resulting API to be SOAP-based. However, the server-side XML-RPC protocol endpoints are sufficiently easy to target by client applications, because there are plenty of pre-built libraries supporting the protocol for practically all platforms that matter, even if these libraries are not too well known. This entry point to the content management backend is not only well suited for interactive editing tools, but also as an interface to push automatically generated content into dasBlog from any other application and even to synchronise content between weblogs.

The second alternative to submitting content through a browser, and by far the most attractive one, is email. When the dasBlog web application starts, it spins up a dedicated thread that will watch, in configurable intervals, a POP3 mail account defined by the weblog owner. Whenever the engine polls the account, it processes every email item in the account, looking for emails whose subject line is prefixed with a configured passphrase. Emails with a matching passphrase are added to the content store. This is obviously a minimalist security measure and the passphrase can even be empty, allowing any email to be published. dasBlog can handle HTML and plain-text formatted messages, and extract, store and link attachments, handle embedded pictures, and through a configuration switch create and embed thumbnails for picture attachments.

Email support provides the most flexible and instantly interoperable model for adding information to the weblog, is readily supported on every platform. It allows content creation and submission anywhere, with familiar tools including SMS and MMS messages sent from mobile phones through an email gateway. With support for HTML and picture embedding and using state-of-the-art email tools like Microsoft Office Outlook 2003, publishing rich content to the Web becomes easier than ever.

Rendering Engine and Localization

The rendering engine is responsible for formatting the content for Web presentation. The core requirements are easy to define: Navigation through the site should be easy and obvious for all visitors, the site should be accessible using virtually every current Web browser on any platform, and it should be possible to easily customise and enhance the site's visual design.

These requirements led to an approach where dasBlog borrows heavily from the popular Radio Userland weblog-tool, and is indeed largely compatible with design templates created for that tool. The reason for taking this route was that there are many free and ready-to-use design templates available for Radio Userland, and also for Userland's Manila content management system, and allowing reuse of these immediately provided a variety of widely known and appealing visual themes that fit many personal tastes. In addition to having a broad selection of ready-to-use themes, the basic navigation scheme is therefore also aligned with a large number of other public weblogs, providing the desired instant familiarity and ease-of-use.

The installable version of dasBlog comes with a set of these templates already configured for use—enabling the user to focus on content and not on technical details of HTML right from the start. Still, because many users want to give their weblogs a personal touch and are not afraid of HTML, customisation beyond simply selecting a template must be very simple.

Design templates for dasBlog use a combination of simple HTML, Cascading Style Sheets (CSS) and a set of macros that is implemented by the engine. A design template (or theme) always consists of three simple HTML files: homeTemplate.blogtemplate (or .txt or .html), dayTemplate.blogtemplate and itemTemplate.blogtemplate. The homeTemplate is used to render the content framework for every page, the dayTemplate is used as a frame for the content of a certain day and the itemTemplate is used to render individual entries. The engine also supports separate templates for each content category.

The templates themselves are fairly easy to customize using a standard HTML editor with Cascading Style Sheet support. The supported macros are well documented and can be inserted into the page using special escape sequences. All dynamic elements such as the calendar or the category lists are partially hard-wired into the rendering engine because of their complexity, but their appearance can be extensively customized using Cascading Style Sheets.

Another concern and requirement for dasBlog was to have good support for localization. Because newtelligence is a German company with customers all across the EMEA region, it was important for us to support full localization into German and English for ourselves and into all EMEA region languages, including the right-to-left languages Hebrew and Arabic, for our customers. To make localization work, dasBlog combines several techniques.

The engine looks at the Accept-Language HTTP header that all major browsers send with each request to indicate the user's preferred language. The current culture of the ASP.NET thread handling the request is set to the language-identifier with the highest preference and subsequently causes all resources to be loaded from the most appropriate resource tables, with a default fallback to a neutral culture that contains all resources in English. This causes the date formatting and the calendar to be properly localized from within the .NET Framework itself and causes all hardwired strings that the weblog engine needs to emit to be rendered in the most appropriate language. For templates, the engine provides a macro that allows specifying multi-lingual strings within the HTML source. Switching the thread into the appropriate locale also enables right-to-left support for Arabic and Hebrew, because the resource tables for these locales contains a special flag that causes the engine to inject the appropriate additional dir attributes into the HTML streams.

In addition to these general localization techniques, it is possible to explicitly set the language for every weblog entry so that only visitors who have this language listed in their browser preferences (and therefore indicate that they can understand it) will see this content. If an entry uses the invariant culture default setting, it is shown to all visitors, independent of language preference. If a language is set of individual entries this is also reflected in the XML data streams rendered by dasBlog where the respective elements are labelled with the appropriate xml:lang attribute.

The combination of these techniques demonstrates that flexible localization is very possible for Web sites in general and we found that the effort it took to implement the complete localization support was very low when using the .NET Framework. In fact, the code required to make localization work was added to the already existing application and deployed in less than two days.

Threading Model and Hosting

dasBlog is hosted in an application domain inside the ASP.NET worker process on Windows 2000 and Windows XP, or in the Web Application runtime of Internet Information Server 6.0 on Windows Server 2003. Because of the focus on a single user's weblog and the resulting limited traffic, it is safe to minimize I/O workload by using aggressive caching, incrementally loading content at the time of the first request and keeping it cached for further requests. Because users very rarely browse through the weblog's history, typically only the content of the last month is cached and even for a very busy weblog this means an in-memory data volume of well under 2MB. Requests for binaries and pictures are handled and served by the Web server directly and are therefore not a concern for the engine.

Because of these considerations, operation in a Web Farm where multiple servers operate on the same content store is not explicitly supported, and therefore not a test requirement. That said, all content updates cause a shared file to be updated. Whenever this file's time stamp changes, indicating an update to the file's content, the internal caches are discarded and incrementally reloaded and clustered operation is therefore possible without causing cache coherency problems. Both failover clustering and load balancing could be implemented by pointing the Website's storage directory to a network drive, but support for this is, as explained, not a primary concern.

A more central concern is to design an appropriate threading model to handle work inside the engine. Of course, the main purpose of the engine is to respond to synchronous HTTP requests for Web service invocations and HTML resources and therefore we can rely on ASP.NET's threading model for the vast majority of the work: Each request is served by a thread that's allocated from the ASP.NET thread pool.

However, there are several activities that the engine can and should perform asynchronously and in background in order to maximize request/response performance. Because the asynchronous actions differ slightly in their execution characteristics, dasBlog employs three variants of thread use:

The first thread model is in the Mail-To-weblog feature and another feature called the Xml Storage System update service (XSS), which we have not discussed in this article because it mainly serves to provide an easier migration path for users switching to dasBlog from Radio Userland, both share similar characteristics. The threads for these two features perform periodic actions. Mail-To-weblog periodically polls a POP3 account and the XSS thread periodically pushes a static copy of the current RSS document to a remote Web service that fronts a distributed file storage system. The only difference between them is that the XSS thread can be explicitly triggered by signalling an event so that content updates are quickly synchronized into the remote store. Neither of these threads is time critical and therefore both run with a below-normal thread priority, which means that they only get served when the system is not busy. Both threads are designed to start up immediately when the application domain is started and to spin infinitely as long the application domain runs. Both threads are robust against internal failures and will fail out gracefully when the application domain shuts down and tears down all running background threads. Except for the signalling event of the XSS thread, the main application does not actively communicate with either thread, but the threads rather invoke functions out of themselves.

The second, slightly different model for threads is employed for handling incoming trackings (that is referrals through, Trackback, and Pingback). Incoming trackings are important to log, but there is no urgency to get them into the data store. One somewhat critical factor relating to detailed tracking of referrers is that it results in a write operation for every incoming request, which creates concurrency issues on the files. Because the required locking results in sequential access to the file store, all trackings are written to an in-memory queue watched by a single thread that runs for the entire application domain lifespan and that processes all trackings in sequence, eliminating concurrency issues. The same strategy is used for sending notification emails to the administrator. While this technique conveniently decouples foreground and background tasks, the primary motivation of this model is to serialize access to limited resources.

The third usage scenarios for threads are Pingbacks, Trackbacks and change notifications that dasBlog actively sends to other sites. Serialization of requests is not required here, because notifications are usually targeted at several different sites, but decoupling from the main thread is imperative because sites might be very slow or unreachable, causing delays of several seconds that can quickly add up when a series of automated Pingbacks needs to be issued-based on a single post. Doing this in the thread processing the request (submitting a new or changed post) would mean that the response is delayed by the cumulative time needed for the notifications, which is clearly unacceptable. Therefore, the data that is required to execute these external notifications is packaged into jobs that are submitted to the .NET thread pool for execution.

Using the .NET thread pool permits concurrent execution once threads become available for servicing, but does not create undue stress by recreating new threads all the time. Instead, the active threads in the pool are being reused. This thread pool is shared with the ASP.NET runtime, which by itself puts a global throttle on the threads that can be executed concurrently, limiting the potential to overstress the machine. At the same time, this approach creates a limited risk of drying up the ASP.NET thread pool and causing external requests to be queued up—at the extreme it may even cause the ASP.NET application domain to recycle (shut down and restart). As in many application designs, this is a case where the advantages (easy programming model) must be weighed against the disadvantages and risks. The special use-case here does justify the chosen model, but if these actions weren't confined to the rare case of updating and adding content, the model used for handling incoming trackings described earlier would be a better choice.

Summary

weblogs are an extremely promising, but much hyped new phenomenon. There are overly enthusiastic claims that weblogs will change and shake the foundations of journalism and even democracy. While these ideas might dramatically overstate their importance, weblogs present some real advantages and opportunities for collaboration and a quick and easy way to publish content in an organized manner.

However, as we have shown, creating a weblog engine that is able to act as a node in an ever changing, growing, distributed, and cross-platform application network comes with a few architectural challenges, even if it the resulting application is relatively small. It must be easy to deploy, easy to use, and easy to customize, navigation must be simple, intuitive and responsive and it must be able to interact and integrate with a wide variety of systems across many platforms.

weblogs are proof that the first generation of Web services is working and that deep interaction between foreign systems is not just a vision of the future, but today's reality—deployed across thousands of servers running Unix, Linux, MacOS X, Windows and many more platforms. They are also proof that commitment to make integration work across all technology camps in a concerted grassroots effort does indeed yield tangible results. None of the weblog technologies highlighted in this article has ever seen a formal standards committee.

Resources

An installable version of dasBlog and the source code of the latest release can be downloaded from the project's documentation website at https://www.dasblog.net.

 

About the Author

Clemens is co-founder and executive team member of newtelligence AG, a developer services company headquartered in Germany. He is a Microsoft Regional Director for Germany. A well-known developer and architect trainer, he is a popular conference speaker, author/co-author of several books, and maintains a widely read and frequently referenced weblog focused on architectural and development topics at https://staff.newtelligence.com/clemensv. Clemens can be reached at clemensv@newtelligence.com

This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.

© Microsoft Corporation. All rights reserved.