Academics

Explores database administration examining the RDBMS engine, studies advanced techniques for managing traditional data: tune and optimize performance, maximize throughput, and design fault tolerant systems. Provides hands-on experience with the Oracle DBMS.

Advanced Database Management Systems (ITEC 541)

  • Relational Constructs of Data Manipulation
    • Review of Conceptual Underpinnings of Relational Databases with emphasis on data independence and its impact on query languages
    • The Relational Algebra
    • Advanced SQL
    • Implementation of retrieval language constructs
  • Physical Database Implementations
    • Storage and File Structures
    • Tuning, indexing, and hashing of queries
    • Query Processing with emphasis on query optimization
    • Enterprise Database Tuning Opportunities
  • Advanced Logical Design Issues
    • Advanced Constraints, Types, and Assertions
    • Concurrency and Client/Server Systems, Transactions, Transaction Isolation Levels
    • Temporal Databases and flashback
    • Missing Information
    • Object Relational Databases
    • Large Objects (LOBs)
  • Issues in Database Security
    • User Accounts, Roles, Profiles, and Privileges
    • Authentication
    • SQL Injection, Inference, and other common attacks
    • Data and Password Encryption, Password Policies

Students who complete this course will be able to:

  • Describe the key attributes of a data retrieval language.  Demonstrate proficiency with the relational algebra or other mathematically based retrieval language.
  • Describe and apply basic concepts of file organization including the properties and architecture of physical devices such as disk drives.
  • Describe and compare methods for efficient data retrieval of persistent data including indexes, hashing, and sequential access.
  • Describe and explain the steps in query processing, evaluate execution plans.
  • Implement operations/algorithms from the relational algebra or other retrieval language.
  • Explain the purpose of query optimization, recognize opportunities for optimization, draw and optimize expression trees.
  • Perform tuning tasks on an enterprise level DBMS.
  • Construct appropriate designs for databases that present significant temporal, null value, or other complexities.
  • Explain the ACID properties of transaction control. Implement transactions with those properties in stored procedures. Implement triggers for complex constraints.
  • Describe and use current extensions of relational database technology such as object-relational or XML extensions.
  • Explain theoretical and practical uses and limitations of nested tables, arrays, and user-defined types in relational databases.
  • Explain options for how large objects (video clips, pictures, documents, etc) are stored and retrieved from a database and the advantages and disadvantages of each.
  • Implement a database application that uses large objects.
  • Describe fundamental challenges associated with database security and associate and describe solutions to those challenges.
  • Analyze and manage typical privilege systems for database systems.
  • Employ data encryption techniques on an RDBMS.
  • Implement Password and/or other authentication policies on an RDBMS.

Advanced examination of principles of database systems studying techniques for modeling, managing, and analyzing large data sets. The course covers the architectural components to support enterprise-level business intelligence with in-depth coverage of the dimensional model, data integration, data visualization, performance dashboards, machine learning algorithms, and application of common data mining techniques. Students use Tableau to explore, analyze, and visualize a dataset and communicate their results in a paper, video, and presentation. Students must have completed a database course including hands-on experience with the relational model, SQL, security, database design, and stored procedures.

Data Warehousing and Visualization (ITEC 542)

  • Introduction to business intelligence
  • Data Warehousing
    • Dimensional modeling
    • Warehouse aggregates
    • Data quality
    • Extract, transform, and load (ETL) process
    • Physical design   
    • Data warehousing lifecycle
  • Reporting and data analysis
    • Online analytical processing (OLAP)
    • Commercial query and reporting tools
  • Data mining
    • Data mining methodology
    • Statistical methods
    • Decision trees
    • Association rules
    • Clustering
    • Neural networks
    • Data preparation

Students who complete this course will be able to:

  • Design and develop a Star schema and describe best practices for dimensional modeling.
  • Design and develop a basic ETL process and explain the challenges of the ETL process.
  • Identify and develop valuable aggregates for a given problem.
  • Design and develop different types of reports and reporting requirements.
  • Describe the limitations of SQL with respect to analytical reports.
  • Describe common data mining tasks.
  • Describe data mining techniques and implement at least one technique.
  • Explain the value of transactional data with respect to business intelligence.
  • Explain the importance of data quality and the challenges of producing high quality data.

Investigates techniques for managing massive volumes of data and studies the design of scalable systems, on-demand computing, and cloud computing. Provides hands-on experience with Hadoop and NoSQL databases.

Distributed Systems Design (ITEC 641)


Topics include:

  • Introduction to distributed databases
    • Need for distributed databases
    • Types of distributed architectures
    • Challenges with distributed computing architectures
  • Comparison of various distributed DBMS architectures
  • Scalability and reliability of distributed architectures
  • Performance tuning on distributed databases.

Students who successfully complete this class will have:

  • Describe and apply general principles and concepts of distributed computing and distributed computing networks.
  • Design and implement distributed databases
  • Compare and contrast consolidated and distributed query processing and concurrency control.
  • Describe distributed database management reliability
  • Describe noSQL solutions to voluminous semi-structured data.
  • Describe and apply techniques to tune database management systems

 

Examines advanced techniques for tuning and optimizing performance. Studies load balancing, clustering, mainframe systems, and other methods of managing traditional data and big data. 

Database Performance and Scalability (ITEC 643)

  1. Basic Database Tuning
    • Review of Indexing and Hashing Schemes
    • Tuning SQL
    • Tuning Memory and Storage Structures and Parameters
    • Tuning Network Communication
  2. Load Testing and Load Balancing
    • Methods for Load Balancing
    • Methods for Load Testing
  3. Virtualization and Cloud Architectures
    • Purpose of Virtualization
    • VMware Details
    • Cloud Architectures
  4. Big Data
    • Defined
    • Data at Rest vs. Streams
    • Current Tools

Students who complete this course will be able to:

  1. Describe and apply techniques for tuning database systems.
  2. Design and apply techniques for load testing and load balancing distributed database systems.
  3. Identify and describe the advantages and challenges associated with virtualization and cloud computing for database systems.
  4. Design and assess Big Data architectures and their performance.
  5. Describe and apply current techniques for Big Data Storage.
  6. Describe the advantages and limitations of noSQL solutions to distributed data.

Studies reliability, security, and privacy issues related to storing, transmitting, and processing large data sets. Studies techniques to secure databases and system infrastructure and methods to assure data integrity through fault tolerance and data recovery.

Information Security and Assurance (ITEC 645)

  1. Fundamentals of information security and privacy
    • Goals of security (confidentiality, integrity, availability, authentication, non-repudiation and
                 accountability)
    • Vulnerabilities and exploits on DBMS and data sets (e.g., Programming flaws, SQL injection,
                   statistical inference attacks)
    • Threat modeling and security analysis
  2. Information Security with data storage and management
    • Cryptography (symmetric key, asymmetric key, secure hashes and modes of operation)
    • Secure design principles (e.g., least privilege, complete mediation, separation of privilege, least common mechanism, defense in depths
    • Authentication
    • Access control
    • Access logs
    • Security mechanisms (e.g., perimeter security, host-based security)
    • Secure operations (backups, hardening distributed databases, disaster recovery, business continuity)
  3. Privacy
    • Statistical inference attacks and controls
    • Legal issues (e.g. HIPAA, FERPA, ECPA)
  4. Reliability
    • Failures
    • Fault tolerance

Students who complete this course will be able to:

  1. Enumerate the main goals of security and privacy including confidentiality, integrity, availability, authentication, non-repudiation, and accountability.
  2. Analyze and develop threat models for the security of database management systems,
    networks and distributed database infrastructures.  
  3. Analyze and develop threat models on the privacy of data (such as inference attacks).
  4. Perform security analysis on centralized and distributed database installations using techniques such as the Open Source Security Testing Methodology (OSSTMM).
  5. Describe and apply cryptographic algorithms, and mechanisms including secure hashes, secret key and public key cryptography, and their modes of operation to secure both stored data and data in transit across networks.
  6. Describe and apply standard secure design principles including least privilege, complete mediation, least common mechanism, economy of mechanism, defense in depth, reluctance to trust, and privacy to the different database installations.
  7. Describe and deploy authentication, fine-grained access control and accountability mechanisms (such as access logs) on database management systems and distributed and centralized database installations.
  8. Describe and deploy mechanisms that provide security such as intrusion detection systems and privacy such as those that protect against statistical inference attacks on databases.
  9. Perform secure operations including backup, recovery and secure updates.
  10. Administer security by enumerating the steps of risk management and developing security policies and plans such as acceptable usage policies, and business continuity and disaster recovery plan.
  11. Enumerate and identify privacy issues of data taking into account the federal and state laws that govern privacy such as HIPAA, FERPA, and the Electronic Communication and Privacy Act.
  12. Describe reliability mechanisms to achieve fault tolerance in distributed databases.

Investigates comprehensive, enterprise-wide approaches to organize, protect, and control trusted information assets. Students gain hands-on experience with the Alation data catalog while studying techniques to govern, control, and protect data including master data management, data quality, and data integration. Studies best practices including DataOps, data mesh, and observability.

Enterprise Information Architecture (ITEC 647)

  1. Information architecture
  2. Information governance
  3. Master data management
  4. Information quality
  5. Data integration
  6. Metadata management

Students who complete this course will be able to:

  1. Explain the importance of data governance.
  2. Develop policies for protecting and securing data and information assets.
  3. Design and develop a system to protect and secure data and information.
  4. Develop a program that integrates data from multiple sources.
  5. Profile data elements.
  6. Analyze the quality of an individual data element.
  7. Analyze the overall quality of a data source.
  8. Design and develop a system to capture and manage metadata.

Explores data structures and algorithms for storing and processing traditional data and big data. Provides hands-on experience with Spark and Scala.

Distributed Algorithms (ITEC 660)

  1. Analysis of algorithms a. Time and space b. Amortized analysis c. I/O bottlenecks
  2. Memory hierarchy a. Caching b. External memory organization (disk organization)
  3. Sorting and searching algorithms (counting sort)
  4. External memory and cache-oblivious data structures and algorithms (e.g., types of B-trees)
  5. Hashing
  6. Algorithms that exploit temporal and spatial locality
  7. Succinct data structures (rank, tries, suffix arrays) to store data compactly.
  8. Advanced topics, such as
    • Data compression
    • Pattern matching
    • Search engine indexing
    • NP completeness

Students who complete this course will be able to:

  1. Compare and contrast temporal and spatial efficiency of distributed algorithms and data structures used to store, query and process medium to large datasets.
  2. Describe and analyze the performance issues of the different memory organizations used to store large data sets.
  3. Describe and apply data structures and distributed algorithms that achieve efficiencies in query and processing times of medium to large datasets.
  4. Describe and apply data structures and algorithms that store data compactly.
  5. Describe current distributed algorithms and data structures used to store, query and analyze medium to large datasets.

Studies techniques for analyzing structured, unstructured, and semi-structured data at rest and in motion. Studies non-traditional data sources including social media, mobile devices, and sensors; emerging analytical applications; real-time processing of data streams; and massively parallel processing technology. 

Machine Learning and AI (ITEC 685)

  1. Databases and their evolution
  2. Big data technology, no-SQL
  3. AI techniques
  4. Logic Rule, Uncertainty
  5. Bayes rule, Naïve Bayes, Bayesian Network
  6. Sentiment analysis
  7. Association rule mining
  8. Learning latent model, Machine learning
  9. Cluster, classification
  10. Linear and logistic regression
  11. Least square, optimization
  12. Non-linear model, Neural Network
  13. Dimensionality reduction
  14. Anomaly detection
  15. Recommend system
  16. Parallel computing, Map Reduce
  17. Analytics tools

Students who complete this course will be able to:

  1. Categorize data into groups based on attributes.
  2. Classify information based on existing data.
  3. Identify the relationship between elements of a decision.
  4. Understand optimization, maximizing certain outcomes while minimizing others.
  5. Develop decision logic or rules that will produce the desired action.
  6. Predict an event in the future effectively based on certain model
  7. Seek out subtle data patterns to answer questions about customer performance, such as fraud detection models.
  8. Understand requirements in using big data analytics.
  9. Simulate human behavior or reaction to given stimuli or scenarios.

Students consult with their advisor and mentors to design a project that demonstrates their skills and experience and prepare for their chosen career path. Each student works with at least one industry mentor. Students design, develop, and test a data pipeline. Projects focus on aspects most relevant to each student (e.g., cloud computing, data engineering, data analysis, predictive modeling, and security). Students propose and defend their project to the Capstone Project Committee, and students present their results in a public forum.

Practicum in Data Engineering: Capstone Project (ITEC 695)

Each section has one instructor. Each student will have a project advisor and an industry mentor. Each student will design, implement, and test a substantial component of an information system. Multiple students may work together to develop an information system involving multiple components.