Distributed systems surround us everywhere today. Their most prominent example is the internet hosting the world wide web. The computing environment in enterprise computing systems is often distributed too, interconnecting different services from human resources, financial departments, to asset management systems. Many applications are even hosted in the cloud. Finally, large-scale engineering and scientific computing today rely heavily on clusters in order to parallelize their workload. These topics are discussed in this distributed computing lecture. In this course, we explore different aspects of distributed computing.
The concept of this course is that we want to understand how the web and distributed enterprise application environments work. We want to do that by starting to explore how to communicate over a network at the lowest level of abstraction (normally) available to programmers, the socket API. From there, we work our way up step-by-step higher levels of abstraction, i.e., simpler and more powerful API stacking on top of each other (and ultimately grounded in sockets). This way, we will gain a solid understanding how distributed applications and the web work. We will be able to look at a website and immediately have a rough understanding of how it may work, down to the nuts and bolts. For each level of abstraction that we explore, we therefore always learn example technologies. The goal is to get a comprehensive understanding of the following topics:
- the world wide web and web-based applications,
- distributed enterprise applications in a service-oriented architecture,
- cloud computing,
- large-scale distributed computing.
For this purpose, we start by briefly discussing the basic communication infrastructure of the internet and the protocol layers (Ethernet, IP, TCP and UDP). We then introduce the sockets API, i.e., the basic API provided by operating systems to access TCP and UDP. When discussing the socket API, we also explore data and text encoding as well as the construction of parallel servers using threads and thread pools. We learn that web browsers and web servers communicate via HTTP is a text-based protocol transmitted via TCP/IP. We implement a very primitive web server by using our knowledge of sockets and HTTP. We then learn that Java Servlets are a more convenient way to implement HTTP servers. We find that servlet containers, i.e., the server software executing servlets, work almost exactly as our parallel web servers, by using thread pools.
We then learn JavaServer Pages (JSPs) as a technology to build dynamic web pages which builds on Java Servlets: our JSPs are compiled to servlets by a servlet container. Distributed enterprise computing today is largely based on web services, which use the XML-based SOAP standard, usually transmitting information via HTTP. We therefore first learn about XML and how to process XML documents in Java. We then implement and use our own web services with axis2. We learn that axis2 can be deployed as servlet, i.e., uses the same infrastructure we already learned.
We move on to learn about how cloud computing allows us to outsource our IT infrastructure (and the associated security risks). We will see how cloud computing frameworks such as the Google App Engine allow us to deploy Java Servlets to a cloud provider’s data center. This closes the circle of enterprise and web application development in this course: We learned all layers of abstraction, from the very basic protocols such as IP, TCP, and UDP, over the socket interface with which we can access them, text encoding, parallelism, to HTTP, the most important protocol in the Web, to Java Servlets which can access it, servlet containers running servlets (using TCP sockets, text encoding, and parallelism), over JSPs, web services, to cloud computing.
For each aspect, we explore several examples (and hands-on homework) using state-of-the-art technologies. As added “bonus”, we use modern build environments and tools such as git, Maven, travis.ci, and GitHub.
Goals: Learn how distributed systems work, from bottom to top, e.g., by understanding
- how communication in a network works at the lowest level,
- how we can use these communication primitives via the socket interface,
- how computations can be parallelized by using multiple threads,
- how protocols such as HTTP work,
- how (multi-thread) servers such as web servers work,
- how to use servlets to build servers generating dynamic contents,
- how to build dynamic websites based on JSPs (which are special servlets),
- how enterprise applications are structured into frontends and application servers,
- how application servers can provide their functionality via RMI, web services, or JSON RPC, and
- how large tasks using big data can be performed in a distributed fashion using MapReduce.
Teacher: Prof. Dr. Thomas Weise
As course material, a comprehensive set of slides and examples is provided. Each course unit targets one closed topic, only building on previously introduced topics, and provides a wide set of examples. Each example is a complete compile- and executable Java program. Each example is focused on exactly one phenomenon.
- Distributed Systems
- Links and Topologies
- Layered Communication
- Datatypes and Marshalling
- Text Encoding
- Threads and Parallelism
- The WWW, HTML, and URLs
- Web Forms
- Java Servlets
- JSPs and Beans
- 3-Tier Structure
- XML Schema
- XML Processing
- JSON RPC
- Cloud Computing
- MapReduce and Hadoop
The homework in this class is always a mixture of practical lab tasks combined with questions about the understanding of the topics.