WIP: Full doc proposal for Campus VCS Service
Compare changes
+ 98
− 0
Version control systems (VCSs) are essential in any programming work environment, and VCS systems are part of the culture of any engineering environment. Currently on the Urbana campus, version controls systems are used by every surveyed unit (see Current Usage for full list). They are used by staff, faculty, and students for administrative, research, and class based work. Due to the prevalence of VCSs on campus; money, time, and manpower is being spent to license and maintain these individual systems resulting in duplication of effort, lack of standardization, and a lack of integration/collaboration between units' VCSs.
In the same vein of the ITPowerPlant projects, it is recommended that the Urbana campus adopts a standardized centralized solution for providing a campus wide VCS tooling system. This document will provide justification, requirements, and progress for such a system as well as information gathered about current VCS usage on the Urbana campus.
A VCS can be thought of as similar to the change tracking feature in Microsoft Word, but applied to one or many directories in a file system. Any group of files can be set as a VCS repository. Once a group of files are setup as a repository, changes can be saved to the repository as revisions. End users can revert or update to any revision at any time. This provides huge advantages over not using a VCS. Was a file accidently deleted? No problem, revert changes. VCS also handles merging of changes to files such that many users can work on one project simultaneously. There are more advanced features (mentioned in the Requirements section), but this is the main component of a VCS.
Corporations and open source communities use VCS to coordinate multiple people working on a single program. This allows new programmers the freedom to experiment with code (if a beginner makes a change that causes a problem, it's easy to track where that change was and revert it), and allows experienced programmers to review changes and provide guidance. Furthermore, because all components of a project are in one location, it is relatively easy to determine if someone else has solved the problem you are working on.
In a classroom setting, VCS is also invaluable because a professor can track who is making changes to files at any given time. Changes can be reviewed, and instructors can see the thought process of how a student (or group or students) have progressed through a programming assignment. Changes are also timestamped, so students can verify that they have turned in an assignment on time. Finally, validation processes can be built into a VCS, so programs can be compiled and tested automatically.
Collaboration tools further expand this by offering project management functionality and other workflow management. Once set up, these tools can provide a self-service environment, where units can be free to work independently but under a centralized system and structure. Similar to campus' EDW database. Units can request access and once granted, everyone is working with the same data, but doing different things with it. Compared with this EDW analogy, an added benefit would be other units can see the work you are doing (should it be made it shareable) and use it or expand upon it. The collaboration systems inherently support this cross-unit collaboration out of the box.
Every unit that is currently using a VCS on campus had to decide which VCS to use, install, maintain, and in some cases pay for that VCS. Considering every unit surveyed is using a VCS, it seems there could be efficiency benefits of having one unit maintain such a system for the same reasons it is better to have a centralized unit managing VMs, Exchange, or any plethora of already existing essential centralized service on campus. Personnel cost savings would be possible as well since we could, at the very minimum, remove any instance of two people being paid by the university to manage the same thing.
A larger and more concerning duplication effort is duplication in programming. Using the previous analogy to EDW, how many programmers on campus have written the same queries to get the same or nearly the same information from EDW? How many programmers on campus are writing things that have already been written by someone else in another unit? Currently the only way to make this determination is by reaching out and asking, which can be very time consuming and you may not even have the correct contacts to find out. The team has done some initial probing doing just that and have found numerous examples of duplicated applications that fulfill the same requirements.
Due to the separated nature of each unit's VSCs, it is cumbersome to share repositories. Either you have a user existing in multiple systems with different sets of permissions or a duplicated repository existing in two systems. Either scenario is less than ideal because of duplicating sets of records (users, permissions, repositories, potentially groups or teams) are created in different systems. Maintaining duplicate users is especially inefficient considering that the campus already has a centralized campus authentication system. A centralized solution using a campus authentication would be a significant improvement over the current environment.
Sharing repositories can be extremely helpful, as mentioned in the previous section, to identity areas with similar requirements. Programmers can then collaborate effectively and make one solution to satisfy interested units. Improved sharing methods also greatly benefit smaller poorly-staffed units who may not have the time or resources to produce the best products but can benefit from the work of others. In short, it would raise the quality floor of all programming products/services across campus.
Various VCS of varying versions are being used across campus. Some versions being used are several years old because the unit does not have enough time to upgrade (or in one case chose not to upgrade because the company started charging for their product). Other considerations include potential security vulnerabilities from using older versions (is the repository URL publically discoverable? Does the version being used have any security vulnerabilities that were fixed in a later version?). Also structure within the VCS varies. Some units are creating one repository that has every one of their projects in it, while others are creating a repository per project. This makes cross training/collaboration/integration more difficult than if everything were standardized and switching between units would not change your usage of a VSC.
As with other current problems, the lack of standardization not only applies to the setup of the VCS, but also the programming code hosted on the VCS. Currently across units it's the Wild West in terms of programming standards, but there are industry standards that efficiently handle common problems that every unit currently experiences. Instead of having every unit learn those lessons and come to a solution individually, they could gain a shortcut utilizing code shared on the campus VCS.
Most units are using free desktop/web clients for their version control that provide various bells and whistles like AD authentication, continuous integration (CI), and other features. However, enterprise versions of these clients exist that units would like to take advantage of but cannot due to cost. Only a few larger units are licensing VCS clients despite the desire by other units to utilize features only available in enterprise versions.
The team has taken a sample of various units across campus on VCS usage. Every unit surveyed as part of this project is using a VCS. However, units are using various products for version control, namely the big 3: Git, Subversion (SVN), and Team Foundation Server (TFS). Between all surveyed units, [need total figure] is being spent annually on licensing for version control server software. Licenses are used for enterprise version web clients of a VCS (most are based off Git).
\ No newline at end of file