Treffer: IP Flow Analysis of UAB's Internet2 Utilization

Title:
IP Flow Analysis of UAB's Internet2 Utilization
Authors:
Publisher Information:
University of Alabama at Birmingham. Department of Electrical and Computer Engineering
Publication Year:
2018
Collection:
University of Alabama at Birmingham: UAB Digital Collections
Document Type:
Fachzeitschrift text
File Description:
application/pdf
Language:
English
Relation:
Technical report (University of Alabama at Birmingham . Department of Electrical and Computer Engineering); 2018-08-ECE-132 Technical report (University of Alabama. Department of Electrical and Computer Engineering); 2001-11-ECE-017; Technical Report 2018-08-ECE-132 Technical Report 2001-11-ECE-0 17 IP Flow Analysis ofUAB's Internet2 Utilization Jill Gemmill This technical report is a reissue of a technical report issued November 2001 Department of Electrical and Computer Engineering University of Alabama at Birmingham August 2018 l : J j . I I ~ I U ' l J '. J [ l l . I : l ! I; l , I _I .I I j . I : I Technical Report 200 I ~ 11-ECE-017 IP FLOW ANALYSIS OF UAB'S INTERNET2 UTILIZATION Jill Gemmill Depa1tment of Electrical and Computer Engineering I University of Alabama at Birmingham November, 2001 I . l ·I B I; ' .I ! j l I :_ j ABSTRACT This project was designed to study utilization of UAB's connection to lnternet2 networks, with a focus on applications and application performance. Questions of particular interest were: - Who is using lnternet2? - What are typical applications? -What is typical throughput? -What is maximum throughput? This report first explains what lnternet2 networks are and where they came from. The UAB campus network architecture is described. The project requirements and methods are detailed, including how data was collected and analyzed. A passive monitoring device was placed at the edge of the campus so that 100% of IP flows could be recorded over a period of one year. Data from May 2000 and May 2001 were analyzed and compared. IP flow analysis summarizes performance at the application level, as experienced by the end user. Since the XNI software used for data collection was not designed for th.is type of study, data was moved to another server, converted to human-readable format, and placed .into an SQL databast:. Queries were written using Perl, SQL and Crystal Reports. The analysis include protocol analysis; top applications by frequency and by bandwidth; throughput achieved in Mb/sec, by application; range of throughput distribution; and campus utilization by building. In answer to the questions from which the project originated: -Who is using lnternet2? Intemet2 is widely used across campus -What are typical applications? Most popular and most bandwidth intenstive applications include world wide web, email, news, FTP-data, peer-to-peer applications, and streaming media. -What is typical throughput? 97% of flows measured by XNI sofiwared experienced throughput less than 150 kb/sec. -What is maximum throughput? One host achieved more than 18Mb/sec - once. Less than 0.2% of flows achieved a target throughput of3 Mb/sec or more. The results suggest that throughput is currently determined more by host and operating system capabilities or configuration. 'I ·. I I . ~ TABLE OF CONTENTS; I ' ~ ' !I ' B : li I. Introduction and Project Objective • 8 II. Project Management 3 ';: t~ III. Prior Publication on this Topic 4 i ~ d IV. Acknowledgements 5 ·l v. Background: Network Architectures 6 Internet2 Networks 6 UAB Campus Network Architecture 8 f I Fiber Plant 9 ~ I VI. Project Requirements 12 ! l VII. Methods 13 VIII. Data Collection 15 i 1 Data Set Description 16 IX. Analysis: Summary of Raw Data 17 . I JP Protocol Analysis 17 Application Data Analysis 17 May 2000 TCP Application Analysis 18 ' J May 2000 UDP Application Analysis 20 May 2001 TCP Application Analysis 22 May 2000 UDP Application Analysis 25 ! I X. Analysis: Application Throughput 28 Analysis ofThroughput by Application 32 ~ I Throughput from a Single Host 35 XI. Analysis: Campus Utilization 37 ~ J XII. Conclusions 42 XIII. Possible Future Directions for Research 44 , I FootNotes 45 : I : I .J . i : I; I '· l ll u I I i I t I u 1 1 . : : I ~ I '.l Appendix A: Glos·sary of Terms Appendix B: Campus Utilization oflntemet2 vi Appendix C: Project Presentation xiv ' .I; I . ! , I I . I . ·' : j .I I ! TECHNICAL REPORT 2001-11-ECE-017 IP FLOW ANALYSIS OF UAB'S INTERNET2 UTILIZATION Jill Gemmill Submitted in partial fulfillment of requirements for MSEE degree November 2001 J. INTRODUCTION AND PROJECT OBJECTIVE The ";1ntemet2" networks are high performance network backbones with usage dedicated to education and research traffic. At the present time 188 research universities, all major federal national laboratories, and many significant national research facilities are connected to these networks. High-speed connections to international education/research networks are available through STAR-TAP, a research and education network exchange facility. The University of Alabama at Birmingham (UAB) applied for and received funding from the National Science Foundation (NSF) in 1998 to connect its campus to the national backbone1 • Beginning as an ATM DS3• connecti~n in May 1998, UAB lnternet2 traffic now rides an OC12 circuit to the Southern Crossroads gigaPop2 in Atlanta, where traffic enters the Abilene national backbone. UAB maintains a second circuit for commodity Internet traffic that is separate from the Internet2 circuit, and has • A Glossary of Technical Tenns can be found in Appendix A I : I f . ! • I ' I ' i I d l ·. j [ l I I I also invested in a substantial campus network upgrade that permits high speed network connectivity to be provided wherever needed . Thus, the UAB connection to Intemet2 provides an opp011unity to understand which sets of educators/researchers at UAB make use of this resource; what performance is typically achieved; and what types of applications traverse the resource. These questions focus on application performance as the focus of a network utilization study. Application performance analysis differs from a more typical Intemet Services Provider (ISP) interest in overall capacity utilization, which lumps all network traffic together into undifferentiated bytes. Equipment to measure network traffic was placed at the edge of the campus at the 12 router to learn answers to the following questions: • Who at UAB is using Internet2? Is usage wide spread across campus, or of use only in special locations? • What applications are typical for Intemet2? • What throughput in megabits per second is typically achieved? • What maximum throughput in megabits per second is achieved? 2 I l l; J ll. PROJECT MANAGEMENT The overall project was designed and directed by Jill Gemmill who also did most of the programming and the data analysis. Assistance tor the project was provided by a number of student assistants, including: Vikram Vijay (l'v1SEE student), who did the file transfer, fi le conversion and some data entry; Chakravarthy Sannedhi (MSEE student), who provided Linux system administration and SQL database administration; these graduate students were supported by the National Science Foundation Cooperative Agreement EPS-9720653. Evan Rowe and Rodney Hasty who prepared some of the CrystalReports graphs, are UAB undergraduate students who were supported by the National Science Foundation Research Experience for Undergraduates (REU) supplemental award to NCRI-9729500). UAB User Services hosted the measurement platform and aided in keeping the system operational. 3 I . f I ' ., : I . I ;j i I !I [ I . ·l . J ' l , I : I .I .I : I HI. PRIOR PUBLICATIONS ON THIS TOPIC A proposal for a paper entitled ";UAB's Internet2 Utilization: A Case Study Using IP Flow Analysis" with authors Jill Gemmill, Murat Tanik, Gregg L. Vaughn, Clair W. Goldsmith, and David L. Shealy has already been submitted for acceptance to the World Conference on Integrated Design & Process Technology, June 23-28,2002 Previous publication by Jill GemmiJI on this topic includes: • Surya Chataut, Stan McClellan, Jill Gemmill, ";Tools for Application Performance Measurement,"; Proceedings of the Society for Design and Process Science, June 2000 and • Jill Gemmill, ";Blind Men Feeling the Elephant: Managing Network Application Performance- Standards, Tools and Cha11enges,"; Integrated Design and Process Technology, IDPT-Vol. I, June 2001 pp. 63-72 . 4 IV. ACKNOWLEDGEMENTS This project is based upon work supported by the National Science Foundation under Grant Numbers 9729500 and 9720653. Any opinions, findings, and conclusions or recommendations expressed in this matetial are those of the author and do not necessarily i -I reflect the views of the National Science Foundation. • NCRT-9729500 ";High Performance Connection for Research Universities in Alabama,"; D. Shealy, PI; J. Gemmill, S. McClellan, P. Hancock, and D. Cordes co-PI's • EPS-9720653 ";Alabama EPSCoR Cooperative Agreement- Infrastructure to Support Alabama and Regional Connections to lnternet2,"; K. Pruitt, PI; D. Shealy, J. Gemmill, S. McCle11an, and D. Cordes co-PI' s. Murat Tanik, Dave Shealy, and Clair Goldsmith provided valuable feedback to the author on earlier versions ofthis report. 5 . I i l I .J I V. BACKGRO"UND: NETWORK ARCIDTECTURES Internet2 Networks The ";Internet2" networks are high petformance network backbones put into place from 1995 to the present, with usage dedicated to education and research traffic. At the present time, over 188 research universities, all major federal national laboratories, and many significant national research facilities are connected to these networks. The continental US Internet2 network is called Abilene3 • The Abilene network national operations center is located at the University oflndiana, and the network itself is supported by universities through the University Corporation for Advanced Intemet Development (l]CAID, also known as Internet2). The Abilene network today is an OC48 packet over SONET network. Recently, the Abilene Acceptable Usage Policy was amended to allow use by sponsored K-20 schools, and libraries. Map Source: Abilene Network Operations Center, Indiana University, ";Abilene Weather map" Figure 1 The Abilene Network Backbone 6 I . I High-speed connections to international education and research networks are available through STAR-TAP4 , a network exchange J·acility located in Chicago. STAR-TAP provides connections to other continents and also serves as a facility for exchange of university and international traffic with federal research networks such as the Department ofEnergy's Energy Sciences Network (ESNeti, National Aeronautics and Space Administration NREN network6 , and others. In the Southeast region thirteen universities connect to national backbones through the Southern Crossroads (SoX/, housed at Georgia Institute of Technology. The SoX network has an ATM architecture, with members interconnected via ATM permanent virtual circuits. Four universities in Alabama, including UAB, connect to SoX via a network called the Gulf Central GigaPoP8 • The Gulf Central GigaPoP interconnects Figure 2 The Gulf Central GigaPoP Network (red: 0Cl2; blue: OC3; black: DS3) UA, UAB, UAH and Auburn with ATM OC3 circuits; an ATM OC12 circuit provides VA, UAB and UAH with OC3 connections burstable to 0Cl2 to other SoX locations and to Abilene. Alabama Research and Education Network (AREN) provides network operations for the Gulf Central GigaPoP9 • 7 . ~ !I l ! . i i • I l t I i I \ I [ I ,-1 , I r I l I .I ' l UAB Campus Network Architecture UAB maintains two separate circuits for network traffic leaving the campus: the OC121ntemet2 circuit for education/research traffic only, and a separate 45 Mb/s circuit for commodity Internet traffic. The UAB campus occupies 90 city blocks and is comprised of 130+ buildings. The university began work on the campus network in 1984 and has been continuously involved in network upgrade projects since then. There are two network service organizations. • Health Systems Information Systems (HSIS) designs the network architecture at the protocol level and provides multiple services to the UAB Hospital and buildings where patient medical care is provided. • UAB Telecommunications Services designs and installs the network fiber and wiring infrastructure for all areas of the campus including those in the hospital areas. Telecommunications Services in addition designs the network architecture, provides network-related services to the UAB campus and also provides access to the Internet, lnternet2, and Alabama Research and Education Network for both the Health System and the campus. 8 ll; I u [.J : I I : I , I The network technology used for the campus backbone is Gigabit Ethernet. For efficient management, a collapsed backbone design is used. The backbone electronics direct Internet traffic across campus and out onto the Intemet. Fiber Plant All buildings on campus are connected to one of three communication hubs using optical fiber. The fiber plant is managed by UAB Telecommunications Services. The fiber backbone consists of both single mode and multi-mode fiber. These fibers are used for network communications, cable TV, security cameras, medicallelevision, and utilities monitoring. Each building has a bundle of single mode and multimode optic fiber strands running vertically through the building from the entrance point to the top floor. Each floor (or sometimes every other floor) has a communications closet dedicated to telephone and network use. These closets are kept at a proper temperature, secured, and have uninterruptible power supply (UPS) systems for the network electronics. Horizontal wiring is run from the communications closet to each desktop. Faceplates providing combinations of Data, Voice, and Cable connectors arc provided. 9 . l i . ' .I I i i •. j : 1 : J . J · . .I Within buildings, Category 5 or higher unshielded twisted pair wiring connects desktops to the network. A Gigabit Ethernet building backbone over multi mode optical fiber is used for multi floor buildings. Computer server clusters are connected to the building entrance using Gigabit Ethernet. Each floor contains one or more switches connected to the building backbone using Gigabit Ethernet. Desktops arc connected at l 0 or 1 00 megabits/second speed (gigabit available when needed). Some legacy Ethernet hubs are used where performance is not degraded or where they increase performance for a particular application. Network electronics provide gigabit speed from a central router to the building. Each closet contains at least one switch that is connected at gigabit speed to the building fiber backbone. Additional switches are provided as needed. Most routing is done at the core router; additional routers direct network traffic onto and off of campus. UAI3 currently uses Foundry and Cisco routers. The core campus network backbone is a centrally located multi-gigabit back plane routing TCP/TP, JPX, and AppleTalk protocols using Gigabit Ethernet links over single mode optical fiber. 111e campus has enabled IP Multicast traffic but has deployed this only in non-production network segments. The Figure 3 diagram shows how the campus backbone connects to outside networks. The dotted ve11icalline denotes the transition fi·om A TM to Ethernet technology. The conversion occurs inside a Cisco LSI 010 switch. Commodity Internet traffic (depicted 10; I . I i . ! I l . ' : i \f n . I II : I l I ~ I ' j . J by blue arrows) is routed to a DS3 circuit leased fi·om ITC"Deltacomm. TI1e dual lnternet2 OC3 circuits connect directly to the SoX equipment room at Georgia Tech . 'I Commodity Internet AREN Network Traffic Figure 3 UAB Network Connections to the Outside World. UAB-managed equipment filled with gray Routing to outside networks is accomplished using two separate Cisco 7507 routers. Based on destination IP address, the Foundry Dig Iron backbone router directs lnternet2 traffic to the router at the top; all other traffic, including Alabama Research and Education (AREN) Traffic, is routed to the lower router. Note the placement of an additional Cisco 7507 router between the UAB 12 router and the LSlOIO switch; that router is managed by AR.EN and also handles lntcrnct2 traffic for the University of A1abama and University of Alabama in Huntsville. The red dot in the diagram indicates the position of the monitoring device used in this project. 11 0 l i I : . . .I . I VI. PROJECT REQUIREMENTS • In order to answer questions about applications and use, network traffic must be observed as a set of application flows; this corresponds to the users' experience using the network. • To understand how use oflnternet2 is distributed across university departments and units, some means for associating a department or building with each application recorded. • To identify types of applications in currenl use, both IP protocol (TCP or UDP) and type of applications needed to be identified. • To analyze bandwidth utilization, total numbers of bytes and/or packets per application were needed. • For throughput, usually recorded as bytes/second or megabits/second, the total elapsed time for the application was needed in addition to total number of bytes or packets per flow. • To study utilization changes over time, information about each application instance would need to be recorded and stored for later analysis. 12 ' 1 . I : I . I l l . 1 : I .I I VII. METHODS An established method for observing individual applications is JP flow analysis10 • A software product named XNI11 was found that provided much of the information needed according to the requirements established for this project. Running on a Unix host, XNl opens the network interface in promiscuous mode and allows the host machine to see not only the traffic destined for itself but also all traffic on the local network. By mirroring traffic fi·om the Intemet2 network port to the XNI passive monitor, it was possible to observe and record all IP flows at the campus ingress/egress point for Tntemet2 traffic. The diagram below indicates placement of the XNI passive monitoring device. . . ~ , r . J.ntcrnct2 Route~ . ATM Switch . .,. C .,.";!'1-+---+-I.,.~(Clsco 7507)++-+-l.,~lCisco LSJO I 0)4f---1:.~ ampus : ' UAB Cam 'iflackbone Route~ !nternet2 Traffic via (Foundry Big ~ fi.b._e r . Iron) : : . ,. . -+ • LinuxBox .,. ~ • ~.--tt.,., w. XNI L--F --- f --- 1 I software ,, Port Mirroring Placement of XNJ monitor Figure 4 Placement of XNI Monitoring Station 13 i. . J . ' .1 l i I ! I I J: .>< Q) ., 0. 0 ";' ~ c: c: ci. z l tL ~ ~ z ::> (!) :::> z 0 .s::: u !b ~ z (.) ~ e 0. 0 0. g; Figure 7 May 2000: Top 20 ApplicAtions by Frequency downloads, and FTP . Ten TCP applications were the most bandwidth-intensive and together constituted May 2000: Top 10 TCP Applications as% Total TCP Bandwidth other TCP Bytes 20% Top 10TCP Bytes 80% May 2000 : Top 10 TCP Applications as% Total Bandwidth other Total Bytes 21% Top 10TCP Bytes 79% Figure 8 Top Ten Applications Account for Majority of Bandwidth 80% of the traffic for the month of May 2000. The bandwidth intensive applications include newsgroup download, web traffic, FTP-data, instant messaging services, peer-to- 19 . I ' 1 . l ;-·1; I ' - i l : I j u i I . .I peer multimedia (such as Napster*~*), X-Windows and Emai l. Of these, News was 58% of · all traffic in May 2000. (see Figure 9) 70% 60% .m. so% >. ID 40% .i.i 0 30% 1- 20% 0~ 10% 0% ·· ~ rn% totar.TCP bytes ~-~- A LL BYTE? . -- __ ,. ··--- --- - -- r g I 1!11 I liW I JSW I ._· ·r - '" I - I -r· --.---.-- (/) S2 t: 0 a. 0 8 U) 0 Figure 9 May 2000: Top 10 TCP Applications by Bandwidth Used May 2000 UDP Application Analysis Looking at UDP traffic for May 2000, 5 applications accounted for 94% ofUDP and 45% of all flows for the month. Eighty percent ofUDP flows were due to the May 2000: Top 5 UDP Flows as% UDP Flows other LOP f low s 6% Top 51JDP flows 94% May 2000: Top 5 UDP Flows as% Total Flows (} Top51JDP other total flow 5 flows 45% 55% f jgurc 10 Top 5 UDP Flows as Percent Total Flows . ";Napster" is used throughout this report to describe applications using ports 6699 and 6688. 20 ~ l I [ 1 :.1 .I I . I Domain Name System (DNS); this is not surprising since DNS is a basic Ethernet service 90% 80% 70% ~ 60% iL 50% iii 40% . .0. . 30% 00 Q) E § z 0.(/) IZ (/)0 ~ Figure 11 May 2000: Top 5 UDP Applications () a. a:: I a. 0 ::::> that makes it possible to route network packets by IP address. Network time is another service that requires sending out requests for time synchronization. MS-IP-DNS is Microsoft's NetBIOS name service. Examining UDP traffic by bandwidth utilization, 7 flows accounted for 56% of May 2000 Top 7 UDP Flows as% UD Bandwidth other LOP flows 44% Top 7 LOP flows 56% May 2000 Top 7 UDP Flows as% Total Bandwidth Top 7UI:P flows other total flows 94% 6% Figure 22 Top UDP Flows as Pe.rcent Total Bandwidth 21 i 1 : I . l J I UDP traffic. UDP traffic consisted mostly of streamed multimedia (Real Player and QuickTime audio/video streams). as well as Quake traffic and DNS services. 30% 25% s 'C 20% -~ 'cC 15% Ill m :::!:! 0 10% 5% 0% . ., m~ (Qij lij.-. _::- cCo') .t. n Jl (i) (I) _I~ C/)w 0) a>::!~ -0:: ·- 0 0:: ·- 0 >.LC'O ";7 z:::> C') ~ ~ :.c '-" ::s (}) a.05 (ij-::l I ~O::J >mo a. 00 !:!::: >RIO ~ ;:) -a; :E .s; l1$ sc: ~ ~ ~ (!) C7) . * 0 Q) a. 0 iE i z" ' ~ ]ro! z" ' ";f$' :ac. 0 5, Figure 17 May 2001: Top 10 TCP Applications by Bandwidth Used 24 ! I ,-, i ' .I .I ., .I May 2000 UDP Applica tion Analysis Five UDP flows account for 96% of total UDP flows and 40% of all flows for the month. May 2001 Top 5 UDP Flows as% UD Flows other UIY flow s 4% top5UJP flows 96% May 2001 Top 5 UDP Flows as% Total Flows other lola flows 60".k topS UfY trows 40% Figure 18 Top 5 UDP Flows as Per cent Flows DNS is again the predominant application; IP multicast traffic appears in the list for the first time, indicating significant growth in the amount of this type of traffic on 80% 70% --- . -- . --- --- --- . -. --- lm% UDP Flowsl - - - - - - - - - - · - - . . - - - - - - - - - - --- -- - - - - ·· • % Tot~_l Flows J ~ 60% 0 50% u: 3 40% . 0 t- 30% --- --- -·· --- --- -' - -- -- -- ··--- --- ~ 0 20% 10% 0% >a:-: w ::::> ~ :z (/) w () E :z :z 0. ~ 0 0 a:: 0. m 0. :z J; z 0 ::E :::> 0 F'igure 19 May 2001: Top 5 UDP Applications 25 .I : I . J . 1 Internet2. Examining UDP traffic by bandwidth utilization, 7 flows accounted for 91% of UDP traffic. May 2001 Top 7 UDP Flows as% UDP Bandwidth other UDP bytes 9% top 7 udp bytes 91% May 2001 Top 7 UDP Flows as% Total Bandwidth top 7 udp bytes 5% other total bytes 95% Figure 20 Top 7 UDP Flows as Per cent Bandwidth UDP applications included multimedia flows again, incJuding Real Player, QuickTime, and IP multicast traffic. A new application in the list is Synchronet, a 70% . . 60% 5 ";C 50% '§t 40% ";C c nJ 30% a:l ';fl. 20% 10% w z g ::2! >oc­w => ~ z Cl Figure 21 May 2001 Top 7 UDP Flows by Bandwidth Used ~ F iii z 26 collaborative that allows the user to establish a custom online service supporting multiple simultaneous users with hierarchical message and fiJe areas, multi-user chat, and games. TaskMaster 2000 is an applJcation that is used in time and motion studies. , l . I; 1 "; l \ J i I I I - J : j 27 _ / . I I ' I I : I r : I I I .l : J X. ANALYSIS: APPLICATION THROUGHPUT TI1roughput was calculated as total bytes transfetTed/ time elapsed (i·om application statt to end. As shown in Figures 22 and 23, more than 97 %of applications experienced total throughput of less than 150 kb/sec, rather disappointing for what is termed a high performance network. While one host/application achieved more than 18 Mb/sec throughput in 2001, that was an almost aberrant performance. 100 10 >. CJ c ---. -- Q) :I C" .Q.) u. 0.1 ~0 0.01 ,o.oo1 0.0001 0 ";' 0> ~ ~ ~ . ";' 0 ";' ";;;'; ,";.'; 3 ";.'. d; ~ ~ ~ ~; ~ ~ ~; ~; Throughput In Mb/sec Figure 23 May 2001: Throughput Distribution - Total Bytes experience 80 to J 00 megabit per second (Mb/sec) throughput. Unfortunately, it became apparent that it was quite rare and quite difficu lt to achieve significantly improved throughput. A study conducted by the Stanford Linear 29; I I I I .I : I ·. I r 1 ll ll n . I : I . I Accelerator Lab showed that the performance on the Internet2 network was only twice that of the commodity Intemet15 • ";C 0.35 5 0.30 ~ 0.25 !Q 0.20 L ~ 0.15 -g 0.10 l ~:~ r___mm·· !";J' L -,. . ·-- . --, May 2000 May 2001 Figure 24 Mean Throughput Mean throughput is shown in Figure 5; the bars indicate standard deviation. The PIT AC report mentions 3Mb/sec as a desirable high-speed network minimum performance goal (3 Mb/sec is 30% of a 1OMb/sec switched connection; a network ";rule of thumb" suggests that an Ethernet network operating at 30% capacity will nol suffer from problems of congestion). IP Flows with 3Mb/sec or greater throughput were therefore examined. Table 4 summarizes tl1e very small number of flows that pass this test. Table 4 Percent Total Flows Exceeding 3 Mb/sec Throughput 30 • l I . i ~ i; ! i I [ l ~ I I 1 I i J l J ! l i I; I ! J , I . I : J . J . 1 Every IP flow with throughput >= 3Mb/sec in May 2000 was a TCP flow. This was true for flows in May 2001 as well, with the exception of one IP Multicast flow and one ICMP flow. This result is surprising, since TCP flows require acknowledgements and experience a sort of ";pulsing" throughput due the filling and draining ofthe'TCP window. Work done by Mark Gates of the National Laboratory for Advanced Network Research (NLANR)tat Pittsburgh Supercomputer Center found that the original TCP protocol did not anticipate throughput possible io today's high performance networks; TCP window size therefore is set to 64 kilobytes or less unless the user manually enables ";Large TCP Windows"; this is true. across operating systems16 • The well-known Nagle's algorithm has to do with TCP trying to be efficient by waiting until a fair amount of data has filled a local bufler and can be sent in a bundle; if the application needs to send many small packets quickly, this can introduce delay. Maximum transfer unit defines the size of the largest packet that can traverse the network without being broken up into smaller packets. The Ethernet MTU is 1500 bytes, and A TM MTU is 8 kilobytes. Ideally, the host should do ";MTU discovery'' to detennine the largest size packet allowed in the current path 17 . 31 : I : i I . I '. I ' l ~ I . I ' J ' J Analysis of Throughput by Application Does throughput vary according to the application? To answer this question, throughput was analyzed by application; in particular, applications with the highest 12 . u10 Ill .!!! 8 .0 :::IE 6 4 +·· + 2· oJ~--~ - . . www FTP + + -. - Mail + + ~ ()Median - - Quartile . + Tenths + ± Dwyco N*apst er Figure 25 May 2000: Throughput by Application ~ 2 · 1.8 1.6 g 1.4 . ~ 1.2 . ~ ~ 1 .g 0.8 g> 0.6. ,. ~; T l T .0 . -.-. . 1--r --- l.c:f:j.L.L- 'T._" --- . --- .---. - - --. WWW FTP Mail Dwyco Napster Figm·e 26 May 2000: Mean Tlmmghput by Atlt>lication throughput were examined. Using a method from the Cross-Industry Working Team18 • Data plotted with this method is shown as mean, range spanning quartiles, and span of 32 I : I; I I I; I i I ~ I , I .J .I . I ";outJier" data points (in the 0-I 01 h percentile, and 90-1 oom percentile). A more traditional analysis showing mean throughput per application with bars indicating standard deviation was also done. 20 + ~edia~ 18 . uartiles 16 enths 14 + g 12 + + ~ 10 ~ 8 · 6 ·+ 4 + 2 + 0 +·· * +· + + www FTP-data Mall Dwyco Napster Application Figure 27 May 2001: Throughput by Application 3 2.5 . ·~ 2 (.) Cll .!!! .Q 1.5 :!: ·~ 1 . 0.5 0 T ., I _I www FTP-data Mall Dwyco Napster Figure 28 May 2001: Menn Throughput by Application 33 { I u I . .I Standard statistical summaries do not allow us to compare a single application from one year to the next. (Examining difference of means, we cannot conclude that the application data from May 2000 has the same mean as that application data in May 2001; if we can't compare differences in a single application across the year, it doesn't make any sense to compare across applications). 3 --- -· . ··-- --- ·· ·r --- ··--~j 2.5 . - . - -· - - - - -- - - - - - - - -- - - - - -- . - . - - - - - --- - -- - - - --- --- - . . - -- - - ";C § 2 · - - -- --- -- --- --- - -- --- --- (,) Q) (/) 2i 1.5 - :.c, tn Q) --- --- ·· - --- --- -- --- -- ::!: 0.5 www FTP-data Mall Dwyco Napster Figure 29 Two Year Comparison - Throughput by Application 34 I n [ l I ,_ I Throughput from a single host In May 2000 there were only 1671 flows with recorded throughput greater than or equal to 3 Mb/sec. A single IP number in the McCallum Building (let's call it 138.26.25.A) achieved this rate 1099 times, ie this single host accounted for 66% of the 8 --- - - - -- -- -- - --- --- -; l performance, independent ofthe network. I 36 I ! I ~ j . I; j I : J I I . J XI. ANALYSIS: CAMPUS UTIIJIZATION Campus network users do not need to do anything special to access lnternet2. Routing is address based: if the destination address is known to belong to a location with an 1nternet2 network address, the router sends traffic out to Intemet2. While it was known that everyone on campus had the capability of using Internet2, how widespread such usage might be was not known. The UAB campus network uses static JP addressing and does not currently support DHCP. IP addresses are from one ofUAB's two assigned class B address blocks: 138.26.X.X or 164.lll.X.X. Subnetting is used in router configuration to assign subsets from this address space to particular router ports. Each router port generally connects to a single building, although there are cases where multiple buildings are connected. Given that the XNI observation point was at the campus edge, either the source address or the destination address would be a UAB IP number (except for multicast addresses). Therefore it was possible to use the campus NETWORK.TXT documentation of TP subnet range assignments to map the UAB address in each flow to a pa11icular campus building . Without exception, every building on campus had Intemet2 traffic in both May 2000 and May 2001. The tables showing utilization for each building for both May 2000 and May 200 l are included in Appendix A. 37 Surprisingly, 40-50% oflnternet2 utilization is by the network services group (Rust Building). Tables 5 and 6 summarize utilization as a percentage of lolal number of flows and percent total bandwidth used. Aside fi·om usage in the Rust Building, top users in May 2000 included a library, hospital facili ties, medical school research centers, engineering and computer sciences, and university administration. By May 2001, the network service group's bandwidth usage was less than 50%, meaning that typical campus usage was increasjng faster than network service use. 38 l .I : I ' I J Building Table 5 May 2000: Top User·s of Internct2, by Campus Building Total Number % Total I Kracke ! Table 6 May 2001: Top Users oflnternet2, by Campus Building 2% 39 .I . J Looking at traffic for the RUST building only, we see that most of the bandwidth consists of network news feeds, and the vast majority of flows are network services May 2000 RUST Bandwidth Util ization May 2000 RUST Flow Frequency ~ i 1~g~ fl"-:: ~-::: ~ ~ ~ ~ = = = = ~ = = = ~ = = = = :: : . ~ 60% - - - - - - - - - - - - - - - - - . - - - - - !Q 40°A. - - - - - - - . - . - - . - - - - - - - - - - ]i 20°A. - - - - - - . . y •• - - - -- - - - -- - :;!. 0°/o . . ., . - -, - -~---, '* ~ 'iij I >- ~ Ia:: CJ) ~ cn w z- ~(/) oz o::::> CJ) z 100% - -- --- - --- ···· --- - 80% . L--L- -- --- . -- -- --- - - -- -- - -- 60°/o - - - - ·· · - - - - - - - - ·· · - - - - - - - - 40% --- - -- - - -- --- --- - 20% ·- --- --- -- - 0% . ·- -r 'I' ---,--., 0z0 ~ g~ ~ I ~ ~ ~ ~ 0 C CP O £0 0 - ~ z a. CJ) - z ~ Figure 32 May 2000: )nternet2 Utilization by UAB Telecommunications Services including DNS, Cisco Auth and JCMP, followed by significant E-mail and Web traffic. 100% - 80% 60% 40% 20% 0% May 2001 RUST Bandwidth Utlllzation · --- - --··-·--- -· --- , I ---········· --- ---·· . __, - ·--·---. --- --··--r····· - ·a; 0. i · ~d; - ~ d. (";) .!. ::;E Qj ~ ~wg_~ -o 0 ·n; ~ . roco d. a.~ 0 Oz!£. t I z May 2001 RUST Flow Frequency 'f: tf:: ::::::::::::::::::::: 400A, • • - - - - - - - • - - - - - - - - - . - - - - - "; "; 20% - - --- -- --- - --- -- - --- 00/o •••• i - I I - I ~ ~ ~ ~ ~ ~ ~ I= a; z Figure 33 May 2001: Internet2 Utilization by UAB Telecommunications Services 40 I 1; l : J : I ~; I u; I .J l ' ) . I ,_ ] Examining distribution of destinations was difficult. Only 12% of the flows in May 2000 and I 0% of flows in May 2001 had names that could be resolved using DNS. For the small sample with resolvable names: Table 7 Top DNS-Resolvable Destination Sites 41 1 I ll \ I 11 [ I I . J : I. J Xll. CONCLUSIONS • • • • Who at UAB is using Internet2? Is usage wide spread across campus, or of use only in special locations? o 1nternet2 is widely used across campus. o Internet2 ";lop users" is comprised of a mix of technical services, administration, I ibraries, and research facilities. What applications are typical for Internet2? o Applications most frequently used at the UAB campus on the lntemet2 network are web, Email, News, FTP-data, and peer-to-peer applications. o The most bandwidth intensive appJications are News, web, FTP-data, peer-to-peer, and streaming audio/video. o TCP traffic accounts for 95% or more of the bandwidth used. o About half the flows are UDP. What throughput in megabits per second is typically achieved? o Fewer than 0.5% of all flows achieve me-asured throughput rates >= 3 Mb/sec. o 97% of flows experience throughput less than 150 kb/sec What maximum throughput in megabits per second is achieved? o Maximum observed throughput was 18.6Mb/sec o A single host accounting for 66% of flows in May 2000 having throughput >= 3 Mb/sec turned out to be a Power Macintosh model7200 with Mac OS7. 42 , I ~ I . / .l ~ I : 1 • Wbat else has been learned? o Internet2 usage doubled (in both number of flows and total bandwidth used) from May 2000 to May 200 I, thus exhibiting the same geometric growth seen in the-commodity Internet. o Finding popular applications is a good way to Jearn from the UAB community about interesting new applications . o Use of collaborative environments and videoconferencing is increasing . o High throughput is clustered around certain IP numbers; this suggests that the network has ";gotten out of the way" and throughput is determined more by application design, operating system and computer architecture and configuration. o Improving throughput will be accomplished by focusing on hardware selection and operating system tuning. 43 XIII. POSSIDLE FUTURE DIRECTIONS FOR RESEARCH • On-going utilization study • Find applications that are candidates for improved throughput'? • Compare 12 usage to commodity utilization • User-centric monitoring: wouldn't it be nice to o Study your average & max throughput (click on your IP number) o Find high throughput performers & figure out why o Take steps to improve throughput & see results 44 l FOOTNOTES 1 High Performance Connection for Research Universities in Alabama NSF Award 9729500, D. Shealy, J Gemmill, S McClellan (UAB); P. Hancock, D. Brown (UA) 2 http://www.sox.net/ 3 http://www .internet2.edu/abilene/ 4 http://www.startap.net/ 5 http://www.es.net/ 6 http://www.nren.nnsa.gov/ 7 http://www.sox.net/ 8 http://www.gcgpop.net/ 9 http://www.asc.edu/network/ 10Mogul, J. C. (1993), ";Observing TCP Dynamics in Real Networks,"; Jn Proceedings of the ACM SJGCOMM '92 Conference, pp. 281-292. 11 http://www.bluebox.com/products.html 12 http :1/www. iana.o.I&b.lssignmentsi.Qort-numbers 13 http://www.netwQrkicc.com/ 14 http://www.practicallynetworked.com/sharinglapp port list.htm 15 ESnel Network Monitoring Task Force. Internet End-to-End Performance Monitoring (JEPM) June 1999. http://www-iePJ!l.slnc.stanford.edu/ 16 User's Guide to TCP Windows hUp://www.ncsa,uiuc.edu/People!vwelch/net periJtcp win_Qows . lttml 17 Enabling High Performance Data Transfers on Hosts http://www.psc.edu/networking/pcrf_ll!ne.html 18 Internet Service Perfonnance: Data Analysis and Visualization. Cross-Industry Working Team July 2000. 45 l I I I l I . l I APPENDIX A: GLOSSARY Ol? TERMS f I . I : I .I Abilene GLOSSARY1 Abilene is an advanced backbone network that supports the development and deployment of the new applications being developed within the Internet2 community. Abilene connects regional network aggregation points, called gigaPoPs A TM Asynchronous Transfer Mode (A TM) is a means of digital communications that is capable of very high speeds. It is used for the transport of voice, video, data and images. ATM is an International Telecommunications Union- Telecommunication Standardization Sector (JTU-T) standard for cell relay. Information is conveyed in small, fixed-size cells. Class B Address Class B networks (128.0.x.x to 191 .255.x.x) have a 16-bit network prefix with a 16-bit host number, and are also referred to as /16 networks. The first two bits of the prefix are set to I 0, hence there are i 4 (16384) possible Class B networks; currently around 12000 are defined (65%). DHCP The Dynamic Hosl Configuration Protocol (DHCP) is an Internet protocol for automating the configuration of computers that use TCP/IP. DHCP can be used to automatically assign IP addresses, to deliver TCP/JP stack configuration parameters such as the subnet mask and default router, and to provide other configuration information such as the addresses for printer, time and news servers. DNS The Domain Name System (DNS) is a distributed Internet directory service. DNS is used mostly to translate between domain names and lP addresses, and to controllnteroet email delivery DS3 Digital Signal Level 3 (DS3): A framing specification for digital signals in the North American digital transmission hierarchy. A DS3 signal has a transmission rate of 44.736 Megabits per second (usually referred to as 45 Mbps). This digital service transmits data over fiber optic cable. Dwyco/ICU 11 The Dwyco Video Conferencing System is a program for Windows95/98/NT4/ME/2000 that allows one to send and receive video, audio, and chat in real-time across the Internet. It is very flexible, and works equally well as a video chat tool or a video broadcast server. 1 Many definitions in this table were adapted from the excellent resource at SearchNetworking.Com http://searchnetworking.lechtarget.com/sDefinition/O . sid7 gci214172.00.html II : The lCUII Videochat Program features 320 X 240 video images, one to one VideoChat I Audio Chat I Text Chat. Users have the ability to create meeting rooms where multiple users can connect, sJ1are video and converse with each other FTP File Transfor Protocol (FTP) allows the user to transfer files across the Internet from one computer to another. gigabit One billion bits gigaPoP A very high speed PoP Gnu lelia Gnutella is a fully-distributed information-sharing technology. The client software is basically a mini search engine and file serving system in one. When you search for something on the Gnutella Network, that search is transmitted to everyone in your ";horizon". If anyone had anything matching your search, he'll tel1 you. ICMP ICMP (Internet Control Message Protocol) is a message control and error-reporting protocol between a host server and a gateway to the Internet. ICMP uses Internet Protocol (IP) datagrams IGMP The Internet Group Management Protocol (JGMP) is an Internet protocol that provides a way for an Internet computer to report its , I multicast group membership to adjacent routers. IP Internet Protocol (IP) TP Multicast Multicast is communication between a single sender and multiple receivers on a network; communications are sent to a group address (Class D address in the range 224.0.0.0 to 239.255.255.255) rather than to a unique IP number. KaZaA KaZaA is a media community, where millions community members can share their media fi les - audio, video, images and documents - with each other. kblsec Kilobits per second. One kilobit= 1000 bits. MAC Media Access Control (MAC) address is your computer's ·unique hardware number. The MAC address is used by the Media Access Control subJayer of the Data-Link Layer layer Mblsec '---·- Megabits per second. One megabit = 1,000,000 bits. iii I , I I l r] l I : I I J ~ J -I Napster NetBIOS NetNews NGIX OC3 OC12 OC48 PoP PVC Quake RPC RTP Napster's software application enables users to locate and share media files from one convenient, easy~to-use interface. lt also provides media fans a forum to communicate their interests and tastes with one another via instant messaging, chat rooms, and Hot List user bookmarks. NetBIOS (Network Basic Input/Output System) is intended for use withjn a local area network. It was created by IBM for its early PC Network, was adopted by Microsoft, and has since become a de facto industry standard A newsgroup is a discussion about a particular subject consisting of notes w1itten to a centrallnternet site and redistlibuted through Usenet, a worldwide network of news discussion groups. Usenet uses the Network News Transfer Protocol (NNTP). Next Generation Internet Exchange (NGIX) : A location where different Internet Service Providers agree to exchange traffic destined for the other's network. Optical Carrier Level 3 (OC3) The Synchronous Optical Network (SONET) includes a set of signal rate multiples for transmitting digital signals on optical ·fiber. The base rate (OC-1) is 51.84 Mbps. Certain multiples of the base rate are provided as shown in the following table. Asynchronous transfer mode (A TM) makes use of some of the Optical Carrier levels. OC 3 is 155.52 Mbps Optical Carrier Level 12 (OC12) 622.08 Mbps Optical Carrier Level 48 (OC48) 2.488 Gbps A point-olpresence (POP) is an access point to the Internet A permanent virtual circuit (PVC) is a software-defined logical connection in a network such as a frame relay network A distributed computing game played in real time; uses 30 graphics for game room. Remote Procedure Call (RPCis a networking technology developed by Sun Microsystems. lt is used on most UNIX machines, and is a popular way of building networked applications; it is also a popular system vulnerability probed by hackers. Rea/lime Transport Protocol (RTP) is the Internet-standard protocol for the transport of real-time data, including audio and video. The iv l . I l : l ll : I RTSP SO NET SQL SSL Synchronet TCP RTP is both an IETF Proposed Standard (RFC 1889) and an International Telecommuncations Union (ITU) Standard (H.225.0 RTP is used by both RTSP and H.323 for the data portion of these protocols . RTP consists of a data and a control part. The latter is called RTCP. Real Time Streaming Protocol (RTSP) is a client-server multimedia presentation control protocol for video on demand applications. RTSP is an Internet Engineering Task Force (IETF) standard: RFC .2326. RTSP provides ";VCR-style" control functionality such as pause, fast forward, reverse, and absolute positioning . Synchronous data transmission on optical med;a (SONET) is the American National Standards Institute standard for a number of line rates up to the maximum line rate of9.953 gigabits per second on optical media. Structured Quety Language (SQL): allows users to access data in relational database management systems by a11owing users to describe the data the user wishes to see. Default port for secure web transactions (https:/1); the server encrypts the transaction to secure the transaction con.tent, such as might be done for credit card transactions Synchronet Bulletin Board System Software is a software package that can turn a personal computer into a custom online service suppOJting multiple simultaneous users with hierarchical message and file areas, multi-user chat, and the ever-popular BBS door games. TCP (Transmission Control Protocol) is a set of rules (protocol) used along with the Internet Protocol (IP) to send data in the form of message units between computers over the Tntemet. While IP takes care ofhandling the actual delivery of the data, TCP takes care of keeping track ofthe individual units of data (called packets) that a message is divided into for efficient routing through the Internet. TCP is known as a connection-oriented protocol, which means that a connection is established and maintained until such time as the message or messages to be exchanged by the appljcation programs at each end have been exchanged. TCP is responsible for ensuring that a message is divided into the packets that IP manages and for reassembling the packets back into the complete message at the other end. In the Open Systems Interconnection (OSI) communication model, TCP is in layer 4, the Transport Layer. v : I ' ! I .I .I .I -- ·-- UDP UP .Link Gateway www UDP (User Datagram Protocol) is a communications protocol that offers a limited amount of service when messages are exchanged between computers in a network that uses the Internet Protocol (IP). UDP is an alternative to the Transmission Control Protocol (TCP) and, together with IP, is sometimes referred to as UDP/lP. Like the Transmission Control Protocol, UDP uses the Internet Protocol to actually get a data unit (called a datagram) from one computer to another. Unlike TCP, however, UDP does not provide the service of dividing a message into packets (datagrams) and reassembling it at the other end. Specifically, UDP doesn't provide sequencing of the packets that the data arrives in. This means that the application program that uses UDP must be able to make sure that the entire message has arrived and is in the right order. Network applications that want to save processing time because they have very small data units to exchange (and therefore very Jittle message reassembling to do) may prefer UDP to TCP. UDP provides two services not provided by the IP layer. It provides port numbers to help distinguish different user requests and, optionally, a checksum capability to vetify that the data arrived intact. In the Open Systems Interconnection (OSI) communication model, UDP, like TCP, is in laye r 4, the Transport Layer. An UP.Link-enabled handheld device or UP.Phone connects to World Wide Web servers through a system called an UP.Link Gateway. The UP.Li.nk device functions much like a WWW browser; the user presses keys on the phone to navigate and request URLs. The UP .Phone uses the data capabilities of conventional cellular networks to send the requests to an UP .Link Gateway, which converts them into Hypertext Transport Protocol (HTTP) requests. The UP .Link Gateway sends HTTP requests over the Internet or direct lines to web servers maintained by Unwired Planet or third-party developers. World Wide Web (WWW) vi .I _ I I , I I I ·---1 . . , .I i l : I; I : j : l; I .I; I; l .l; I .I APPENDIXB: CAMPUS UTILIZATION OF INTERNET 2 vii ._ __i Vlll --- ___,i ___! ~ - Hill Universitv Center 172 6073 0.83% 1,625 0.41% Hoehn 10 8882 1.21% 4,084 1.04% Holley-Mears 47 584 0.08% 16 0.00% Humanities I Honors House I Wallace & Bell ] Stephens ] Geology 41 7529 1.02% 470 0.12% Lister Hill Librarv 55 29356 3.99% 8,932 2.27% Maze TfYLINES MODEM 1089 0.15% 118 0.03% McCallum Building 6 32078 4.36% 60,809 15.44% ! Medical Education Building 148 1238 0.17% 366 0.09% ' Medical Towers 133 7295 0.99% 672 0.17% Mervyn Sterne Library 156 6305 0.86% 1,231 0.31% Mortimer Jordan Hall 61 1564 0.21% 342 0.09% Optometry Building (Peters) 272 6664 0.91% 1 701 0.43% Pathology West Pavilion 224 6332 0.86% 617 0.16% Paula Building 0036A 10 0.00% 0 0.00% Police Headquarters Building 263 61 0.01% 17 0.00% Print Shop 23 158 0.02% 803 0.20% Professional Arts Building 219 258 0.04% 78 0.02% RUST RUST I 324195 44.07% 205,028 52.07% RUST IPX RUST-lPX 18 0.00% 0 0.00% RUST Pop Top Server I NASA RUST-POP 10 0.00% 0 0.00% Rvals Building 283 8569 1.16% 1,709 0.43% School of Business 712 5153 0.70% 354 0.09% School of Engineering 5 38465 5.23% 10,782 2.74% School ofNursing 134 814 0.11% 136 0.03% SHRP 149 3472 0.47% 142 0.04% ;Spain Rehab and Annex 158 1012 0.14% 113 0.03% Sparks Center 152 1531 0.21% 314 0.08% The Children's Hospital 16 15466, 2.10% 2,209 0.56% Veteran's Administration Hospital 178 67 0.01% 4 0.00% Volker Hall 181 20253 2.75% 6,955 1.77% Wallace Tumor 188 16517 2.25% 8 963 2.28% WBHM 183 3648 0.50% 44 0.01% ix .____; ___j Webb Buildin 184 1585 0.22% 0.24% Worrell 186 2857 0.39% 0.66% Table 1 : May 2000: Internet2 Utilization by Building X . ,. _!. xi '--- __] Bill University Center 172 25754 1.33% 8 832 1.26% Hoehn 10 16400 0.84% 1,545 0.22% Holley-Mears 47 2430 0.13% 75 O.Olo/c Humanities I Honors House I Wallace & Bell i Stephens I Geology 41 7432 0.38% 1,773 0.25% Lister Hill Library 55 213968 11.01% 26,890 3.83% Maze TIYLINES MODEM 730 0.04% 0 0.00% McCallum Building 6 49031 2.52% 27,609 3.93% Medical Education Building 148 9976 O.Slo/c 2,587 0.37% Medical Towers 133 24744 1.27% 24,890 3.54% Mervyn Sterne Library 156 I I 29480 1.52% 2,800 0.40%, Mortimer Jordan Hall 61 1565 0.08% 136 0.02%1 Optometry Building (Peters) 272 16926 0.87% 3,790 0.54% Pathology West Pavilion 224 8346 0.43% 608 0.09% Paula Building 0036A 485 0.02% 43 0.01 o/c Police Headquarters Building 263 394 0.02% 37 O.Ol o/c Print Shop 23 1344 0.07% 269 0.04% Professional Arts Building 219 256 O.O lo/c 0 O.OOo/c RUST RUST 735138 37.84% 308,410 43.88% RUST IPX . RUST-IPX - 170 0.01% 0 0.00% RUST PopTop Server I NASA RUST-POP 918 0.05% 9 O.OOo/c Ryals Building 283 24879 1.28% 3,631 0.52% School of Business 712 17428 0.90% 3 027 0.43% School of Engineering 5 34473 1.77% 13,248 1.88% School ofNursing 134 2166 O.llo/c 441 0.06% SHRP 149 8426 0.43% 477 0.07% Spain Rehab and Annex 158 380" 020% 1 568 0.22% Sparks Center 152 4031 0.21 o/c 1,334 0.19% The Children's Hospital 16 22825 1.17% 1,332 0.19% Veteran's Administration Hospital 178 388 0.02% 80 O.Olo/c !Volker Hall 181 47754 2.46% 8711 1.24% !Wallace Tumor 188 42656 2.20% 7 280 l.04o/c WBHM - - 183 6556 0.34% 271 0.04o/c xii _____.J_ 184 8870 186 6098 Table 2 : May 2001: Internet2 Utilization by Building xiii . I : I I J I .l > ";)( . . ' . . ,. . . , . ·-·--. __ . . . . . . ''" . ~ . ,. . . . --- - - -- --- --- ---; · -~ --- --- --- --- - --· --- --- --- ---· --- --- --- IP Flow Analysis ofUAB's Intemet2 Utilization ECE Masters Project Jill Gemmill November 2001 ·--- --- ·--- --- .___;_ Project Objective • To understand utilization ofUAB's connection to Intemet2 networks, with a focus on applications and application performance -Who is using Internet2? - What are typical applications? - What is typical throughput? - What is maximum throughput? -. __. _. ·~,·- . .- . Project Management • Project design, direction, analysis and coordination­Jill Gemmill • File transfer and conversion to SQL format­Vikram Vijay (MSEE student) • Linux system and SQL DB administration­Chakravarthy Sannedhi (MSEE student) • Preparation of CrystalReports graphs- Evan Rowe and Rodney Hasty (NSF REU students) . . -············· . . • ·······"' . . T . . . . . . --- --- --- - --- --- -· -- --- --- ·--- - - - - ______. --- --- ---· ·-- --- --- Conference Paper UAB's Internet2 Utilization: A Case Study Using IP Flow Analysis Jill Gemmill, Murat Tanik, Gregg L. Vaughn, Clair W. Goldsmith, David L. Shealy for World Conference on Integrated Design & Process Technology June 23-28, 2002 H O . 00000o0 ~ 0 0 0000 • 0 0 Previous Work on this Topic • Surya Chataut, Stan McClellan, Jill Gemmill, ";Tools for Application Performance Measurement'', Proceedings of the Society for Design and Process Science, June 2000. • ''Blind Men Feeling the Elephant: Managing Network Application Performance-- Standards, Tools, and Challenges'' Jill Gemmill, Proceedings of the Workshop on Transdisciplinary Education, Research, ·and Training, Integrated Design and Process Technology, June 2001 --- --- --- --- . . . . . ·-·--.-. . ' . -~ • •• 0000 • __:_:_r •••• •• •• • Ow • • "; Acknowledgements - • NSF Grant: NCRI-9729500 High Performance Connection for Research Universities in Alabama (D. Shealy; J. Gemmill, S. McClellan, P. Hancock, D. Cordes) • NSF Grant: EPS-9720653 Alabama EPSCoR Cooperative Agreement - Infrastructure to Support Alabama and Regional Connections to Internet2 (K. Pruitt; D. Shealy, J. Gemmill, S. McClellan, P. Hancock, D. Codes) . . . . ~ . . . . . . . , . r . M . . - . , •• • _ . . . . . --- --- --- ~ - - - -- - --~ --- --- --- --- --- ~ --- - - ___ , - ---· Road Map ~a:c:w::s::ztiUW\li\J'2tL!AX::amr:::wu;m ,.::»+"' c:x m:!Wi41ititte4AV"P'!m~ ~ ~,.-w Zl ··~·t rmmr::rr~--= • Background: What are the Intemet2 networks? • Project Requirements and Methods • Data Collection • Analysis • Conclusions . -. , . -· ··· ·· . . , . :·· ·······- ·. . . -. . . . ,. . . . --- --- ·- - - --- · --- ·-- - - ·--- --- --- - - - --- - - - --L --- --- --- --- ---· --- Background: Intemet2 Networks • :ci!'tf ::::;c::::£iU --- i&i!Jieo:c::i:osm:ecs:oi::::u::z ~ • ·Research and Education network traffic only . • StarTap international NGIX • Abilene national backbone • Southeastern Universities Research Association : Southern Crossroads (SoX) and Mid-Atlantic Crossroads (MAX) • Alabama: Gulf Central GigaPoP . I : I ! . J . I ·. I . I . . . I . . ! I . . 1111111111 = (I) . ·- - - - --- ~ --- ······--·· . . . . . . .T . . . . . . . ~ - - - --- - -- --- ---'- - -- --- . _ '"- Background: Abilene (Intemet2) Network • IP over ~ONET . . . AbHen• Core Top()IOgy •Connecting 188 Untverstttes, tvrc3.rch.2odo· National La and K-20 £,~•r··~· http://www .in ternet2.ed u/ aoueH:~r- . . - . . ·--- --- --- --- --- ·--- --- --~ --- --- --- -- ___! - -- ·-·-- --·- --- ·--- - - -·· Background: SURA ~ , :ac; ;;:::::;:;;:: 4 :a a ,- ·wue:uu:xii:::r:o:aiu::u:=::=o 4 t uwmw:sw::u:uww!ll'l!'1~ r ., • SURA: Southeastern Universities Research Association • ATM OC3/0C12 GigaPoP connecting 17 universities in the Southeast • Located on Georgia Tech. campus • On-ramp to national/international networks for our region / --- - --- ---"'i ·--- --- --~ ·--- _j' . . . . . . --- ~-- --~ - Background: Gulf Central GigaPoP Connection Speeds UA: 155 Million bps SoX UAB: 155 M~ll~on bps UAH: 155 Mtll1on bps Auburn: 155 Million bps GCG-Atlanta: 311 Million bps [ 6900 text pages/second] Alabama A&M: 45 M bps South Alabama: 45 M bps Tuskegee: 45M bps ! . . .•• _! :.__:_j - · -- Background: UAB Network Architecture ~ Internet2 fi ! Traffic l! 5 ~:3 J I CJSCO LS1010 i.1 e ATM IlJ S~itch •·n"C! ~ § ~ ~ :;; ~ ~ ~ ~ il UA ~ ~t, . r'l 'l4·' r~-J~ ~l ~:- '\ 0 ' • 00'00• •O•MO. -·- , . , . ";' O '"-' "; '" '"' ., .,. O ,,., ,. ,.,, ,.,,~, •• •·•, ,_, -• --- --- --- --- --- --- --- --- --- --- --- --- i --- ~ --- --- --- --- --·--- --- Analysis -·- ___ gx~!_11:Ql~rQ_tocol_ A ll. 10 (J c: (I) 1 :::::1 c- ~ 0.1 LL ~ 0.01 0.001 0.0001 ___I Analysis Examnl~_Ihroughnn_t An~Jysis l ! ~ l l l ' I . ~· . . - l I - - I I •• • •• I • .I I; I I i ! I I I I I , ' I J J T I l; • ' ' I !; lc I I TT .TTT : l o ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ m ~ ~ ~ N ro ~ ~ ";";.' ~• C'\.1 .";";' oq-• C'\1. .";.";' .• .C'\1. .";";";' o• ~• ~ N• ~• ~• o N ~ ~ ~ ro m ~ ~ ~ ~ ~ Throughput in Mb/sec •• ' ' "; '' '' ' ' .,. ";";' "; ·•-••w•• , . ., '''"'' ";";"; '' •••• ,. • ' ' "; '" ' "; ,,., '"1 - •·••• ";' •• '"'''" .,.,._ '' ";'"' '"'" --- --- --- --- - - - --- --- ~-- --- --- --- - -- _._ --- --- --- - - - --- --- Results Who is using Internet2? -Widely used across campus : every building had 12 traffic in both years -Top Users by Bandwidth: Network Services, McCallum, Campbell Hall (CIS & Physics), Health System, Engineering, Lister Hill Library • • •• . , . . . , , , _,. • • , . . ,•v•••••• • ·•• .,,,,. • ,,,,. . . "; ('•"·•" , •• •. • . . ·--- --- --- --- - --·- --- ~ --- - --- '--- ___; --- --- - --- --- --- Results c=~==~--~Mma.~--~ftD:C~ --- ~==~==~PQ-=~= --- ===~=-~u.-=~ What are typical applications? • Frequency: WWW, Email, News, FTP­data, and various peer-to-peer applications • Bandwidth: News, WWW, FTP-data, peer-to-peer, and streaming audio/video • 95% of bandwidth is TCP • About half of flows are UDP 18 Mb/sec - once. --- PITAC target: 3Mb/sec May 2000: 0.2% total flows May 2001: 0.08% total flows -=- Known Issues: • Small TCP Window Size • Small backbone Maximum Transfer Unit Throughput Results (Cont' d) -Does throughput vary by application? • Cannot compare one application across years (hypothesis rejected) 3 t --- ,c 2.5 --- . - -- --- --- --- --- [li!]I MMaayy 220000.0!J ! ~ 1 : ~ m m:::: ~: :~: ~ • • • • • :::::::::::: =:: :::::::::::: • .g 1 C) ~ 0.5 Q l Ir \rt1»BV ii I I I - . • . . • I IJ. I ·www Mail Dwyco Napster . r . Throughput Results (Cont' d) = - •. =· == .' "; ·- ii .• .• • . . , -~;;;;;. .;.;; ;;···c;.;;;;;;;;;;;;.;;;;,iiiiioiir.iii·~·=·; ;:a;;:;:;;;;;;;;;;;;;;;;;;;;;;;~;;;;;;;;o;:;;o:;;;;;;;;a;iiiiiiijji;;o;; • In May 2000, _one host in McCallum accounted for 66% of all flows >== 3Mb/sec • · Throughput from that host to its destinations on a host by host basis: 8 1 --- --- --- -- --- --- · 7 + ---1 --- :-~ --- --- -- --- --- - - --- - - · ";8C ' 5 ! • • • • 6 + - - -; - - - - - - -f : --- · --- Q) c. 5 --- ·t· ~ --- +- - - - - - - - - - - - - - - - • - - - - - - - - - - - - - - - - - - - - . ~ ~ 4 - -~· --:---·i i. --- ~ -- ~-- -- - - -j-d --- ·--- --- .Q . 3 T--- ·· -- - -. I: . -r- . --i --- . --_._ -. --- --- . nJ .s:::: , i • •· • . I • . C) I- ' • • •• • -t.t: •• ~ 21 ~::! :.-·::·J,:~::~:;~J!: ::-:::~-~:~:~:~r~l~: +:!e-.· : •• •· • ;s t • ••• • • . ••t• J• t • • 0 ' ' • • • ' ' • !-t- .r ~ . ., .t• • -+4 . ., 0 5 1 0 15 20 25 30 35 40 45 50 55 60 65 70 Non-UAB Host · --- --- --- Throughput Results ( Cont' d) •ein!mCA'Tti=mn~m~ 'VW!t!";'&"'(' , ";";t'l"'?n.a.~a. s 1 , lllltJ ,. ";'""; :«n»'~: -=.,, . ·-rrrr z: 0 euu• • Suggests throughput determined by host capabilities ";network is out of the way" • Improving throughput will be accomplished by focusing on hardware/ operating system tuning · . . --- ~· .,.,. • .-. ~ ~ --== r· --- --- --- -- - - -- .__ - -- ,-_·• - -- --- --- ! ~ --- --- --- --- --- - Throughput Results (Cont' d) • Suggests - throughput determined by host capabilities ";network is out of the way" - Improving throughput will be accomplished by focusing on hardware/ operating system tuning . ·--- - -· . ·-- ·•· --. ·--- - --- ~ --- ~ --~ --- --- --- - -- --- --- -- - --- --- --- --- --- Other Results • I2 usage doubled over the year (typical geometric Internet growth) • Finding frequently used applications is a - good way to learn about useful new applications from the user community .- . . -. ."; . ~' --.-~- --- ~ --- --- --- ·--.--- --- _ :__ --- - --- Possible Future Directions • On-going utilization study • Find applications that are candidates for improved throughput? • Compare 12 usage to commodity utilization • User-centric monitoring: wouldn't it be nice to - Study your average & max throughput (click on your IP number) - Find high throughput performers & figure out why - Take steps to improve throughput & see results; http://uab.contentdm.oclc.org/cdm/ref/collection/uab_ece/id/150
Accession Number:
edsbas.39218857
Database:
BASE

Weitere Informationen

This project was designed to study utilization of UAB's connection Internet2 networks, with a focus on applications and application performance. Questions of particular interest were: Who is using Internet2? What are typical applications? What is typical throughput? What is maximum throughput? This report first explains what Internet2 networks are and where they came from. The UAB campus network architecture is described. The project requirements and methods are detailed, including how data was collected and analyzed. A passive monitoring device was placed at one edge of the campus so that 100% of IP flows could be recorded over a period of one year. Data from May 2000 and May 2001 were analyzed and compared. IP flow analysis only summarizes performance at the application level, as experienced by the end user. Since the XNI software used for data collection was not designed for this type of study, data was moved to another server, converted to human-readable format, and placed into an SQL database. Queries were written using Perl, SQL, and Crystal reports. The analysis included protocol analysis, top applications by frequency and by bandwidth, throughput achieved in MB/sec, by application, range of throughput distribution, and campus utilization by building. In answer to the questions from which the project originated.