Xin chào các bạn, ở bài trước mình có giới thiệu cho các bạn về Animation Listener và Custom Animator với Lottie.
Trong bài này mình sẽ giới thiệu cho các bạn về cách điều chỉnh các thuộc tính động. Bạn có thể điều chỉnh các thuộc tính động trong thời gian đang chạy của nó. Một số mục đích như:
Thay đổi chủ đề.
Thay đổi kích thước và thời gian.
Đáp ứng với những sự kiện lỗi hay thành công.
Nội dung bài bao gồm:
Hiểu về After Effects để có thể vận dụng vào trong bài này.
Cần có những gì để thay đổi?
Cách thực hiện nó như thế nào?
Hiểu về After Effects
Để điều chỉnh các thuộc tính trong Lottie thì mình cần hiểu các thuộc tính đó.
Các thuộc tính này được kế thừa từ các thuộc tính trong After Effects. Trong After Effects, nó là tập hợp các Layer ứng với mỗi một thời gian. Đối tượng trong Layer bao gồm: tên, màu sắc, kích thước … Lottie có thể tìm thấy các đối tượng và thuộc tính bằng KeyPath.
Cần có những gì để thay đổi?
Để thay đổi thuộc tính trong thời gian chạy, bạn cần có:
KeyPath
LottieProperty
LottieValueCallback
KeyPath
KeyPath được sử dụng với một nội dung cụ thể hoặc toàn bộ nội dung cần thay đổi. Nó được xác định bởi một danh sách các chuỗi tương ứng trong cấu trúc phân cấp của After Effects.
KeyPath bao gồm tên cụ thể của nội dung hoặc ký tự đại diện:
Wildcard *: sử dụng để phù hợp với nội dung duy nhất ở vị trí của nó trong KeyPath.
Globstar **: sử dụng để phù hợp với không hoặc nhiều layer.
KeyPath resolution
KeyPath có khả năng lưu trữ một tham chiếu nội bộ đến nội dung mà họ quyết định. Khi bạn tạo một đối tượng KeyPath mới, nó sẽ không được quyết định. LottieDrawable và LottieAnimationView có một phương thức notifyKeyPath() lấy KeyPath và trả về một danh sách bằng 0 hoặc nhiều quyết định mà mỗi quyết định thành một phần nội dung bên trong. Điều này có thể được sử dụng để khám phá cấu trúc animation của bạn. Để làm như vậy, trong môi trường phát triển, new KeyPath("**") và ghi lại danh sách được trả về. Tuy nhiên, bạn không nên sử dụng ** với ValueCallback vì nó sẽ được áp dụng cho mọi phần nội dung trong animation của bạn. Nếu bạn quyết định KeyPath của mình và muốn thêm một giá trị callback, hãy sử dụng KeyPath được trả về từ phương thức đó vì chúng sẽ được giải quyết nội bộ và sẽ không phải tìm lại nội dung.
LottieProperty
LottieProperty là các thuộc tính có thể được set. Chúng tương ứng với giá trị trong After Effects.
Bạn có thể tham khảo các thuộc tính ở đây.
ValueCallback
ValueCallback được gọi mỗi khi animation hoạt động. Nó cung cấp:
Khung bắt đầu của khung hình hiện tại.
Khung kết thúc của khung hình hiện tại.
Giá trị bắt đầu của khung hình hiện tại.
Giá trị kết thúc của khung hình hiện tại.
Giá trị progress từ 0 tới 1 của khung hình hiện tại với ngoài thời gian interpolation.
Giá trị progress của khung hình hiện tại với thời gian interpolator.
Progress trong tổng thể animation từ 0 tới 1.
Ngoài ra, cũng có một số lớp con ValueCallback như LottieStaticValueCallback, nó nhận đúng một giá trị trong constructor và sẽ luôn trả về giá trị đó.
ValueCallback classes
LottieValueCallback: đặt giá trị tĩnh trong contructor or override getValue().
LottieRelativeTYPEValueCallback: đặt giá trị trong constructor or override getOffset().
LottieInterpolatedTYPEValue: cung cấp giá trị bắt đầu, giá trị kết thúc và tuỳ chọn interpolator để có thời gian.
Cách thực hiện nó như thế nào?
Mình vẫn sử dụng trail-loading.json ở bài trước để điều chỉnh.
Sau đó, bạn hãy xem các thuộc tính và phân cấp của nó:
Hoặc, có thể sử dụng công cụ Lottie JSON Editor.
(Với file mình sử dụng bên trên thì sẽ có 5 layer, còn các bạn…). Dưới đây sẽ là một vài ví dụ thay đổi.
Thay đổi trong một layer
Mình sẽ thay đổi màu Shape Layer 1("nm": "Shape Layer 1"), trong Ellipse 1("nm": "Ellipse 1") sang màu đỏ như sau:
Các bạn có thể tham khảo các thuộc tính có thể thay đổi dưới đây:
Transform:
TRANSFORM_ANCHOR_POINT
TRANSFORM_POSITION
TRANSFORM_OPACITY
TRANSFORM_SCALE
TRANSFORM_ROTATION
Fill:
COLOR (non-gradient)
OPACITY
COLOR_FILTER
Stroke:
COLOR (non-gradient)
STROKE_WIDTH
OPACITY
COLOR_FILTER
Ellipse:
POSITION
ELLIPSE_SIZE
Polystar:
POLYSTAR_POINTS
POLYSTAR_ROTATION
POSITION
POLYSTAR_OUTER_RADIUS
POLYSTAR_OUTER_ROUNDEDNESS
POLYSTAR_INNER_RADIUS (star)
POLYSTAR_INNER_ROUNDEDNESS (star)
Repeater:
All transform properties
REPEATER_COPIES
REPEATER_OFFSET
TRANSFORM_ROTATION
TRANSFORM_START_OPACITY
TRANSFORM_END_OPACITY
Layers:
All transform properties
TIME_REMAP (composition layers only)
Bài này đến đây là kết thúc rồi, hãy đón đọc bài viết tiếp theo của mình bạn nhé. Cảm ơn các bạn đã dành thời gian đọc bài của mình. Rất mong nhận được sự góp ý từ các bạn ^^
Xin chào các bạn, ở bài trước mình có giới thiệu cho các bạn về Lottie và cách sử dụng Lottie cho Android.
Trong bài này mình sẽ giới thiệu cho các bạn về Animation Listener và Custom Animator với Lottie.
Animation Listener
Có rất nhiều trường hợp khi sử dụng Lottie mà chúng ta cần phải xử lý các công việc khác nữa. Dưới đây là một vài trường hợp cụ thể:
Mở một màn hình mới sau khi kết thúc chạy animation với Lottie.
Cập nhật giá trị trong khi đang chạy animation với Lottie.
Điều chỉnh tốc độ hay thời gian chạy của animation.
Tham số valueAnimator bên trên thực chất nó là tham số số trong ValueAnimator Class ở Android SDK. Nó cung cấp cho bạn biết về trạng thái hiện tại và thời gian hiện tại của animation.
Bây giờ, tôi sẽ có 1 bài toán nho nhỏ như sau: ở bài trước tôi sử dụng loading với Lottie, bài này tôi sẽ thực hiện khi loading thì sẽ thực hiện cùng ProgressBar và set giá trị của progress đó hiển thị trên màn hình.
Để kết hợp Lottie vào ứng dụng để có những hình ảnh mượt mà, sinh động thì thật đơn giản phải không nào? 😉 Nhưng sẽ có nhiều trường hợp việc kết hợp cũng trở nên quan ngại phần nào: ví dụ như kết hợp Lottie khi download, khi scroll position hay cử chỉ… Vì vậy chúng ta phải Custom Animator để cho phù hợp với từng bài toán. Ví dụ sau sẽ giúp các bạn hình dung dễ hơn:
//Custom animation speed or duration.
val animator = ValueAnimator.ofFloat(0f, 2f)
animator.addUpdateListener { valueAnimator: ValueAnimator ->
loading_animation.speed = valueAnimator.animatedValue as Float
}
animator.start()
Ở đây, mình đã thực hiện điều chỉnh tốc độ của loading chạy từ 0 -> 2 bằng việc sử dụng Animator.
Custom Animator
Bài tiếp theo mình sẽ giới thiệu làm cách nào để điều chỉnh các thuộc tính động, sẽ có nhiều cái thú vị đấy, hãy đón đọc bài viết của mình bạn nhé.
Cảm ơn các bạn đã dành thời gian đọc bài của mình. Rất mong nhận được sự góp ý từ các bạn ^^
Xin chào các bạn hi hi, lại là mình đây. 🙂
Để thay đổi không khí sau loạt bài về animation, hôm nay mình sẽ giới thiệu với các bạn về Lottie cho Android.
Sau những loạt bài về animation thì các bạn có thấy ứng dụng của chúng ta đã trở nên đẹp và sinh động hơn chưa?. Tôi nghĩ chắc chắn là rồi phải không? 🙂
Nhưng sẽ có nhiều vị khách khó tính thì vẫn có chút xíu chưa hài lòng về độ mượt mà của animation. Bạn đừng lo, Lottie sẽ giải quyết vấn đề đó cho bạn ngay.
Ứng dụng của bạn sẽ trở nên mượt mà, sinh động và đẹp hơn rất nhiều nữa đấy. Nghe đến đây thì bạn đã hào hứng để tìm hiểu nó rồi chứ. Let’s go…
Trong bài này mình sẽ giới thiệu đến các bạn những mục sau:
Giới thiệu về Lottie
Cách sử dụng Lottie cho Android
Giới thiệu về Lottie
Lottie là một mã nguồn mở về animation được xây dựng bởi Airbnb. Nó có thể dùng được ở Android(hỗ trợ android từ phiên bản JellyBean API 16), iOS, React Native hay cả Web. Về bản chất hoạt động thì nó sẽ parse animation từ Adobe After Effects, thông qua Bodymovin và được xuất ra định dạng json. Sau đó, các nhà phát triển của các platform sẽ sử dụng công cụ thư viện Lottie để các animation sẽ được hiển thị tương ứng trên các platform.
Cách sử dụng Lottie cho Android
Đầu tiên, bạn chuẩn bị cho mình file có định json (được xuất ra từ Adobe After Effects, thông qua Bodymovin như bên trên mình đã chia sẻ).
Thường thì cái này được design team của các dự án cung cấp cho bạn hoặc bạn tự tạo đều được.
Bạn cũng có thể tham khảo ở đây rất nhiều.
Tôi đã chuẩn bị cho mình trail-loading.json cho bài viết này.
Sau khi có được file định dang json bên trên thì tiếp đến bạn sẽ tạo project và thêm vào trong build.gradle một dependencies như dưới đây:
Bạn sẽ tạo assets folder (app/src/main/assets) và copy file có định dạng json bên trên vào nhé. Còn mình sẽ copy file trail-loading.json của mình vào.
Tiếp theo bạn sẽ thêm animation vào xml với layout tương ứng để bạn hiển thị. Ở đây, mình sẽ thêm vào xml trên activity của mình.
Thật đơn giản phải không các bạn? Giờ thì xem thành quả của bạn vừa làm nhé.
Trail-Loading
Đến đây, các bạn sẽ đặt câu hỏi rằng: tôi có thể điều khiển và lắng nghe nó không? (think) Oh, Tất nhiên là được rồi. Bài tiếp theo mình sẽ giới thiệu, hãy đón đọc bài viết của mình bạn nhé.
Cảm ơn các bạn đã dành thời gian đọc bài của mình. Rất mong nhận được sự góp ý từ các bạn ^^
Một trong những vấn đề cơ bản trong lập trình ứng dụng là truyền data giữa các view controller. Có rất nhiều cách để làm điều này, như là dùng protocol, notification,… Ở bài viết này, mình sẽ giới thiệu đến 1 cách nữa, đó là sử dụng callback.
Bài viết này yêu cầu sự hiểu biết về closure. Nếu chưa hiểu về closure, bạn có thể đọc tại đây:
Mọi function đều có 1 kiểu dữ liệu cụ thể, được tạo bởi kiểu dữ liệu của các tham số truyền vào và kiểu dữ liệu trả về của function đó.
Ví dụ ở func trên, không có tham số truyền vào và không có kiểu dữ liệu trả về, nên function type của func trên là () -> ().
Đối với func không có kiểu trả về, thì cũng có thể viết theo cách khác là func đó trả về kiểu Void.
func calculate(a: Int, b: Int) -> Int {
return a + b
}
// Function này có kiểu dữ liệu là (Int, Int) -> Int
Ở bài viết lần này, mình sẽ không nói sâu về function type, mà sẽ tập trung vào chủ đề passing data bằng callback. Vậy callback là gì?
Callback là gì?
Callback có thể hiểu như là 1 closure được gán cho 1 biến.
Để sử dụng callback truyền data, bạn khai báo callback với kiểu dữ liệu là kiểu dữ liệu của data mà bạn muốn truyền đi.
var onCalculate: (Int, Int) -> (Int)
Ở trên là ví dụ cách khai báo 1 callback có kiểu dữ liệu (Int, Int) -> Int. Callback này sẽ truyền data là 2 tham số kiểu Int đi, và khi 1 nơi khác nhận được data này, nó sẽ xử lí data và trả về 1 kiểu Int.
var onPrint: (String) -> Void
Ở ví dụ này, callback sẽ truyền data kiểu String đi, và khi 1 nơi khác nhận được dữ liệu, nó sẽ xử lí dữ liệu theo kiểu Void.
Note: Nếu muốn truyền đi dữ liệu kiểu khác, bạn chỉ cần đơn giản sửa callback thành kiểu dữ liệu bạn mong muốn truyền đi.
Truyền data sử dụng callback
Giả sử có 2 view controller như sau:
View Controller 1 có 1 label để hiện thị kết quả, và 1 button để push sang View Controller 2.
View Controller 2 có nhiệm vụ thực hiện tính toán tổng của 2 số kiểu Int và trả kết quả kiểu Int về cho View Controller 1 để hiển thị. -> VC2 muốn truyền đi 1 data kiểu Int.
View Controller 1 sau khi nhận được data của VC2, sẽ hiển thị lên label.
VC1:
class FirstViewController: UIViewController {
@IBOutlet weak var myLabel: UILabel!
override func viewDidLoad() {
super.viewDidLoad()
}
@IBAction func didTapButtonGoNext(_ sender: Any) {
let storyboard = UIStoryboard(name: "Main", bundle: nil)
let secondVC = storyboard.instantiateViewController(withIdentifier: "SecondViewController") as! SecondViewController
// Viết code hiển thị kết quả ở đây
navigationController?.pushViewController(secondVC, animated: true)
}
}
VC2:
class SecondViewController: UIViewController {
// 1
var completionSum: ((Int) -> (Void))?
override func viewDidLoad() {
super.viewDidLoad()
// 3
let result = calculate(num1: 9, num2: 8)
completionSum?(result)
// 4
navigationController?.popViewController(animated: true)
}
// 2
func calculate(num1: Int, num2: Int) -> Int {
return num1 + num2
}
}
Khởi tạo 1 callback kiểu (Int) -> (Void), tức là callback này sẽ truyền đi data kiểu Int, và khi nhận được data sẽ xử lí theo kiểu Void.
Khai báo func calculate để tính tổng 2 số và trả về kết quả kiểu Int.
Tính tổng 2 số 9 và 8. Gọi completionSum?(result) để truyền đi result.
Ở đây có dấu ? ở callback vì callback này là kiểu optional. Khi callback chưa được khởi tạo thì trình biên dịch sẽ bỏ qua mà không gọi callback. Vì vậy hãy chắc chắn rằng bạn đã khởi tạo callback trước khi gọi chúng.
Back về VC1 để hiển thị kết quả.
Giờ thì quay trở lại VC1, add thêm đoạn code sau vào phần để trống
để khởi tạo completionSum cho VC2, bằng cách gán completionSum bằng 1 closure có cùng kiểu dữ liệu.
Ở đây bạn đã viết đoạn code để xử lí dữ liệu nhận được từ VC2. Sau khi nhận được result kiểu Int, bạn sẽ xử lí dữ liệu theo kiểu Void bởi kiểu dữ liệu của callback là (Int) -> Void.
Khi VC2 gọi completionSum để truyền result đi, VC1 sẽ ngay lập tức nhận được và gọi hàm update label ở trên.
Trình biên dịch sẽ chạy theo trình tự như sau:
Bấm vào button 1 -> Khởi tạo VC2, khởi tạo completionSum cho VC2 -> push sang VC2.
VC2 tính toán kết quả -> gọi completionSum -> VC1 nhận được kết quả và update cho label -> VC2 pop về VC1 để hiển thị kết quả.
Ngoài ra bạn có thể khởi tạo callback bằng cách gán callback bằng 1 func có sẵn có cùng kiểu dữ liệu.
Ở VC1, khai báo 1 func như sau:
Robot Framework là một testing framework. Nó cung cấp mọi thứ cần thiết để xây dựng và phát triển một kịch bản kiểm thử, gồm điều kiện đầu vào/kết thúc, báo cáo kết quả, … Điểm hấp dẫn của Robot Framework với các tester chính là chúng ta không cần quan tâm đến các thuật toán lập trình cơ bản nhất. Mọi thứ chúng ta cần làm chính là viết ra một kịch bản kiểm thử dựa trên các từ khóa (keyword) mà thôi.
Cụ thể hơn, Robot Framework là:
Framework dùng để kiểm thử, cung cấp nền tảng kiểm thử cho tester dựa trên ngôn ngữ lập trình Python. Cách tiếp cận của nền tảng kiểm thử này là hướng từ khoá (keyword driven) và hướng dữ liệu (data driven) dành cho việc kiểm thử để nghiệm thu sản phẩm ngay từ đầu (end-to-end acceptance testing).
Để tiếp cận nền tảng kiểm thử này, tester chỉ cần viết kịch bản kiểm thử theo hướng từ khóa (keyword driven) và hướng dữ liệu (data driven).
Tester có thể tạo các từ khóa cấp cao mới từ những cái hiện có bằng cách sử dụng cú pháp tương tự được sử dụng để tạo ra các trường hợp thử nghiệm.
Robot Framework giúp chúng ta thực hiện kiểm thử tự động với kịch bản ở dạng bảng một cách dễ dàng. Robot Framework đưa ra kết quả thực thi các kịch bản kiểm thử và các log ở dạng html, giúp chúng ta đọc và phân tích kết quả nhanh chóng và dễ dàng hơn.
Robot Framework có hỗ trợ chức năng đánh dấu các kịch bản kiểm thử, cho phép chúng ta lựa chọn kịch bản kiểm thử tiện lợi và nhanh chóng.
Thế mạnh lớn nhất của Robot Framework chính là khả năng chạy trên nhiều hệ điều hành khác nhau mà không cần chỉnh sửa kịch bản kiểm thử hay các từ khóa ở tầng dưới.
2. Các thư viện hỗ trợ trong Robot Framework
Robot Framework có rất nhiều thư viện hỗ trợ cho việc kiểm thử tự động, có thể tham khảo các thư viện dành cho Robot Framwwork tại http://robotframework.org/#test-libraries .
Tuy nhiên, trong nội dung bài viết này, chúng tôi sẽ tập trung giới thiệu 2 thư viện phổ biến nhất đó là Selenium2Library và Calculator Library.
2.1 Selenium2Library
Selenium2Library được sử dụng để kiểm thử trên nền Web, và được fork từ SeleniumLibrary và được bổ sung để sử dụng Selenium 2 và WebDriver.
Selenium2Library hoạt động ở hầu hết các trình duyệt hay dùng như IE, Firefox, Safari, Chrome, … và có thể được dùng với cả Python và Jython.
Để chạy các testcase bằng cách sử dụng Selenium2Library, trước tiên bạn cần:
Cài đặt Selenium2Library,
Import Selenium2Library vào các testsuite Robot.
Dùng từ khóa Open Browser để bật trình duyệt muốn dùng kiểm thử.
Tại sao nên sử dụng thư viện Selenium2Library?
Selenium2Library là ngôn ngữ rất sát với ngôn ngữ thực tế của người dùng, bạn mong muốn action gì bạn chỉ cần gõ từ khóa tương ứng.
Tiếp theo ta sẽ tìm hiểu về một thư viện được có sẵn trong Robot Framework, một thư viện về tính toán đơn giản, nó chỉ chứa logic nghiệp vụ chứ không bao gồm phần UI.
Chúng ta cùng đi qua một ví dụ về CalculatorLibrary và cùng run một test case đơn giản để có thể hiểu hơn về Robot Framework.
Mở terminal và trỏ đến folder chứa file vừa download.
Run Test case “keyword_driven.robot” bằng command sau: $ robot keyword_driven.robot.
Kết quả là 4 TCs Pass và 1 TC Failed.
Kết luận: Đối với Robot Framework chúng ta không cần phải biết lập trình để viết testcase và script như những công cụ khác. Đối với những yêu cầu đơn giản và nhanh chóng thì Robot Framework là một sự lựa chọn phù hợp.
II. Cài đặt
Phần này sẽ hướng dẫn cách cài đặt RF cùng với Selenium trên Linux và Windows.
1. Cài đặt Robot Framework
Bước 1. Cài đặt Python.
Trước hết, vì Robot Framework là một nền tảng kiểm thử dựa trên nền tảng Python, nên trước tiên cần cài đặt Python (nên cài Python 2.5 hoặc mới hơn – khuyến cáo cài đặt Python 2.7).
Linux: Python thường đi kèm với cài đặt Ubuntu/Linux. Để kiểm tra xem Python đã cài đặt chưa, cũng như phiên bản của nó, dùng câu lệnh sau trên Terminal (có thể dùng phím tắt Ctrl+Alt+T để bật Terminal: $ python –version. Nếu Python đã được cài đặt, bạn sẽ nhìn thấy phiên bản của nó, chẳng hạn Python 2.7.6.
Windows: Tương tự như trên Linux, bạn hãy bật cmd lên và kiểm tra xem python đã được cài đặt chưa bằng lệnh: python –version. Nếu chưa, đi đến https://www.python.org/ và tải phiên bản Python tương ứng và cài đặt nó.
Bước 2. Cài đặt PIP (Python Package Manager).
Linux: PIP là một package manager cho việc thiết lập các gói Python. Để cài đặt PIP, dùng câu lệnh sau:
$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python get-pip.py
Windows: PIP đã được cài đặt nếu bạn đang dùng phiên bản Python 2 >=2.7.9 hoặc Python 3 >=3.4 tải xuống từ https://www.python.org/, tuy nhiên bạn sẽ cần nâng cấp PIP bằng lệnh sau:
python -m pip install -U pip
Bước 3. Cài đặt gói Robot Framework bằng cách sử dụng PIP.
Linux: Dùng câu lệnh sau để cài đặt Robot Framework:
$ sudo pip install robotframework
Phiên bản mới nhất của Robot Framework sẽ được cài tự động. Nếu muốn cài một phiên bản cụ thể, chỉ việc thêm vào, chẳng hạn:
$ sudo pip install robotframework==2.8.4
Sau khi cài đặt hoàn tất, dùng câu lệnh sau để xem việc cài đặt đã thành công chưa: $ pybot –version. Bạn sẽ nhìn thấy thông tin phiên bản Robot Framework nếu thiết lập thành công.
Windows: Tại cửa sổ Command promt, chuyển tới thư mục cài đặt Python và dùng lệnh sau để cài đặt RF:
pip install robotframework
Sau khi cài đặt thành công, dùng pybot –version để kiểm tra:
2. Cài đặt Selenium2Library
Để làm việc với Webdriver (Selenium2) và Robot Framework, bạn cần cài đặt Selenium2Library bằng cách sử dụng PIP:
Linux: Dùng câu lệnh sau: $ sudo pip install robotframework-selenium2library
Câu lệnh này sẽ tự động cài các dependency của nó, gồm decorator, và các gói Selenium. Ngay sau khi hoàn tất cài đặt, dùng câu lệnh $ python để chuyển đến cửa sổ của python:
Ở cửa sổ này, gõ câu lệnh sau để import Selenium2 Library: >> import Selenium2Library.
Nếu không thấy lỗi nào bắn ra nghĩa là thư viện Selenium2 đã được cài thành công. Nếu muốn thoát khỏi cửa sổ Python, dùng lệnh exit().
Windows: Tại cửa sổ Command promt, dùng lệnh: pip install robotframework-selenium2library. Sau khi cài đặt thành công sẽ có thông báo như sau:
3. Cài đặt RIDE (Standalone RobotFramework Test Data Editor)
RIDE là một IDE để xây dựng kiểm thử bằng cách sử dụng Robot Framework. Ngoài RIDE ra, bạn có thể thay thế bằng SublimeText, IntelliJ hay Eclipse, … Vì RIDE được phát triển bằng cách sử dụng wxPython nên bạn cần cài bộ tool wxPython 2.8 có hỗ trợ unicode để chạy RIDE. Cụ thể như sau:
Linux: Dùng câu lệnh sau để cài wxPython:
$ sudo apt-get install python-wxgtk2.8
$ sudo apt-get install python-wxversion
Tiếp theo, dùng câu lệnh sau để cài RIDE:
$ sudo pip install robotframework-ride
Để xác minh xem việc cài đặt đã OK chưa, chạy câu lệnh sau: $ ride.py. Ứng dụng RIDE sẽ bật lên như sau:
Tại cửa sổ Command promt, dùng lệnh: pip install robotframework-ride để cài đặt RIDE.
Sau khi cài đặt thành công sẽ có thông báo như sau
Dùng lệnh: ride.py để khởi động RIDE.
III. Cách bắt phần tử giao diện
1. Tổng quan XPath
XPath là một cách để phân tích mã HTML nhằm xác định các yếu tố của một web driver.
Là ngôn ngữ hỗ trợ tìm kiếm thông tin trong tài liệu XML qua việc sử dụng biểu thức XPath để định hướng tìm kiếm dữ liệu trên XML thay vì phải thực hiện tìm kiếm đệ qui để duyệt cây XML.
Xpath định nghĩa 7 loại nodes theo mô hình thể hiện bên dưới từ root, element, attribute, text, namespace, processing-instruction và comment.
Ngoài ra, Xpath còn định nghĩa một số node đặc biệt để thể hiện mối quan hệ giữa các node trong mô hình trong quá trình xử lý như sau:
Parent Node: node trên trực tiếp của node hiện hành.
Child Node: tập node trực tiếp của node hiện hành cấp thấp hơn.
Sibling: node ngang hàng hay cùng cha với node hiện hành.
Ancestors: tất cả node con bên trên node hiện hành cùng nhánh.
Descendants: tất cả node con bên dưới của node hiện hành cùng nhánh.
Cú pháp của XPath:
Để truy vấn với đường dẫn tuyệt đối nghĩa là đi từ root của tài liệu XML đến các thành phần cần truy cập, XPath qui định với cú pháp bắt đầu bằng dấu /
Để truy vấn với đường dận tương đối để có thể truy cập đến thành phần bất kỳ thỏa điều kiện, XPath qui định cú pháp sử dụng với dấu //
Để truy vấn đến một thành phần bất kỳ mà không cần biết tên của nó là gì, XPath qui định ký tự sử dụng là *.
Để truy cập thuộc tính của một node, XPath qui định thuộc tính truy vấn phải có cú pháp bắt đầu là @.Ví dụ @tênThuộcTính.
Điều kiện khi truy vấn được đặt trong dấu []
Truy vấn lựa chọn nodes
Biểu thức
Định nghĩa
tênNode
Chọn tất cả các node con của tênNode.
/
Chọn tất cả các node tính từ root.
//
Chọn tất cả node tính từ node hiện hành.
.
Chọn node hiện hành.
..
Chọn node cha của node hiện hành.
Các phép toán được sử dụng trong XPath:
Đại số: +, -, * (nhân), div (chia thập phân), mod (chia lấy dư)
So Sánh hay quan hệ: =, != (khác), <, <=, >, >=
Luận lý: true, false, and, or, not
Kết hợp: | (hội)
2. Cách bắt XPath bằng Firebug và FirePath
Firebug và FirePath là 2 add-ons hỗ trợ cho việc bắt XPath nhanh và dễ dàng hơn trên Firefox browser.
Cài đặt Firebug và FirePath:
Trên Firefox browser, chọn icon [Open Menu] -> Add-ons.
Tìm và cài đặt Firebug, FirePath.
Sau khi cài đặt thành công, Firebug và FirePath xuất hiện trong mục Extensions và các icon của chúng sẽ xuất hiện trên thanh công cụ của Firefox.
Sử dụng Firebug và FirePath:
Nhấp vào icon con bọ -> chọn thẻ FirePath.
Nhấp vào ký hiệu con mũi tên bên cạnh con bọ -> tiếp nhấp chuột vào element cần lấy xpath. xpath của element đó sẽ hiển thị:
Lưu ý:
XPath lấy được từ FirePath chỉ mang tính chất tham khảo và giúp người dùng xác định phần tử dễ dàng hơn.
XPath lấy được từ FirePath là cách đơn giản nhất nhưng lại chưa đảm bảo tính ổn định và duy nhất khi version của web page thay đổi. Vậy nên, người dùng có thể sử dụng một số cách hỗ trợ truy vấn sau để bắt XPath nhằm tăng tính ổn định và khả dụng khi muốn sử dụng các element liên quan đến nhau một cách chính xác hơn:
Axis
Định nghĩa
ancestor
Chọn tất cả các node trên của node hiện hành.
ancestor-or-self
Chọn tất cả các node trên của node hiện hành và chính nó.
attribute
Chọn tất cả các thuộc tính của node hiện hành.
child
Chọn node con của node hiện hành.
descendant
Chọn tất cả các node dưới của node hiện hành.
descendant-or-self
Chọn tất cả các node dưới của node hiện hành và chính nó.
following
Chọn tất cả các node sau khi tag đóng của node hiện hành.
following-sibling
Chọn tất cả các node ngang cấp sau khi tag đóng của node hiện hành.
namespace
Chọn tất cả namespace của node hiện hành.
parent
Chọn tất cả node cha của node hiện hành.
preceding
Chọn tất cả các thành phần trước khi bắt đầu tag mở của node hiện hành.
preceding-sibling
Chọn tất cả các node ngang hàng trước khi bắt đầu tag mở của node hiện hành.
self
Chọn node hiện hành.
Ví dụ:
Vd1: ancestor
Hiển thị tất cả các thẻ cha có chứa thẻ div id=<”identifier-shown”>, ko bao gồm thẻ div id=<”identifer-shown”> được inspect từ trang Login Gmail.
Nếu chỉ muốn hiển thị thẻ cha được chỉ định.
Vd2: ancestor-or-self
Hiển thị các thẻ cha và cả thẻ div id=<”identifier-shown”>, có bao gồm thẻ div id=”<identifier-shown>” được inspect từ trang Login Gmail.
Vd3: attribute
Chọn tất cả các attribute hiện hành của nút Next ở trang Login Gmail.
Vd4: child
Chọn tất cả các thẻ con của thẻ div class=”input-wrapper focused”.
Vd5: descendant
Chọn tất cả các thẻ con và cháu của thẻ div id=”input-wrapper focused”, ko bao gồm thẻ div id=”input-wrapper focused”.
Vd6: descendant-or-self
Chọn tất cả các thẻ con và cháu của thẻ div id=”input-wrapper focused”, bao gồm cả thẻ div id=”input-wrapper focused”.
Vd7: following
Hiển thị tất cả các thẻ sau thẻ đóng của thẻ div class=”identifier-wrapper focused”.
Hiển thị 1 thẻ sau thẻ đóng của thẻ div class=”identifier-wrapper focused”.
Vd8: following-sibling
Hiển thị các thẻ sibling sau thẻ đóng của thẻ div class=”input-wrapper focused”.
Vd9: parent
Chọn tất cả thẻ cha của thẻ div class=”input-wrapper focused”.
Vd10: preceding
Chọn tất cả các thẻ trước thẻ input id=”next”, ngoại trừ các ancestor và attribute.
Bài viết này của mình có 2 nội dung chính. Đầu tiên mình sẽ nói về cấu tạo của một constraint để các bạn có thể hình dung được nó hoạt động như thế nào. Thứ hai mình sẽ nói về việc làm thế nào để tạo một constraint sử dụng code mà không cần dùng đến Interface buider.
Layout của view hierarchy được định nghĩa là một chuỗi các phương trình tuyến tính. Mỗi ràng buộc đại diện cho một phương trình duy nhất. Mục tiêu của bạn là khai báo một loạt các phương trình có một và chỉ một giải pháp khả thi.
Giờ chúng ta sẽ xem cấu tạo của một constraint nó như nào nhé:
Constraint này nói rằng cạnh trái của view màu đỏ(Red view’ leading edge) phải là 8 points sau cạnh phải của view màu xanh(Blue view’s trailing edge). Phương trình của nó có các phần như sau:
Item 1: Đây là phần đầu tiên trong phương trình, trong trường hợp này nó là View màu đỏ. Phần này phải là một View hoặc là một layout guide.
Attribute 1: Là thuộc tính được constraint trên Item 1. Trường hợp này nó là cạnh trái của Red View(Red view’s leading edge).
Relationship: Mối quan hệ giữa bên trái và bên phải của phương trình. Giá trị của relationship có thể thuộc 1 trong 3 giá trị: bằng, lớn hơn hoặc nhỏ hơn. Trong trường hợp này bên trái và bên phải bẳng nhau.
Multiplier: Hệ số giá trị của Attribute được nhân với số này. Trong trường hợp này hệ số nhân là 1.0. Trong các trường hợp mặc định hệ số này là 1.0, nên ta có thể bỏ đi nếu không muốn thay đổi giá trị này.
Item 2: Là phần thứ 2 trong phương trình. Trong trường hợp này nó là Blue View. Không giống với Item 1, nó có thể để trống.
Attribute 2: Là thuộc tính được constraint trên Item 2. Trong trường hợp này là cạnh phải của BlueView. Nếu Item 2 để trống thì Attribute 2 sẽ không tồn tại
Constant: Một hằng số bù trừ, trong trường hợp này nó là 8.0. Giá trị này được thêm vào giá trị của thuộc tính 2(Value of Attribute 2).
Hầu hết các constraint xác định một mối quan hệ giữa hai items trong giao diện người dùng. Những constraint này có thể đại diện cho các View hoặc Layout guide. Các Constraint cũng có thể xác đinh mối quan hệ giữa hai thuộc tính khác nhau của một item. Ví dụ: bạn có thể đặt Aspect ratio giữa chiều cao của 1 item với chiều rộng của nó. Bạn cũng có thể gán giá trị hằng số cho chiều cao hoặc chiều rộng của Item đó. Khi làm việc với giá trị Constant, Item 2 được để trống thì attribute 2 sẽ được gán là không phải thuộc tính và multiplier được gán là 0.0
Auto Layout Attributes
Trong Auto Layout, các thuộc tính được xác định là một tính năng có thể được ràng buộc(Constraint). Thông thường, nó bao gồm 4 cạnh: Trên(Top), Dưới(bottom), Trái(leading), Phải(trailing), cũng như chiều cai, chiều rộng và căn giữa theo chiều dọc và ngang. Các text cũng có một hoặc nhiều thuộc tính baseline.
Auto layout constraints cho phép chúng ta tạo ra các view tự động điều chỉnh theo các Size Class và Vị trí khác nhau. Các constraint sẽ đảm bảo view của bạn điều chỉnh phù hợp với bất kỳ thay đổi kích thước nào mà không phải cập nhật thủ công các khung hoặc vị trí của nó.
Tuy nhiên, ngoài việc sử dụng Interface Builder để Auto Layout thì chúng ta có thể sử dụng code để Auto Layout.
Việc viết AutoLayout có một số ưu điểm và nhược điểm như sau: Ưu điểm: – Dễ dàng merge code khi dùng Git – Dễ debug – Các constraint có thể dễ nhìn hơn Nhược điểm: – Không có đại diện trực quan. Việc này khiến bạn phải tưởng tượng thay vì như IB cho ta nhìn thấy Views của mình trên màn hình ở nhiều kích thước khác nhau. – Bạn có thể sẽ phải viết nhiều code layout lên trên ViewController của mình.
Theo mình để có thể làm tốt Auto Layout bằng code thì mình nghĩ các bạn nên sử dụng thành thạo Interface Builder trước. Việc kết hợp giữa Interface Builder và constraint trong code cũng có thể là một giải pháp tốt. Tuy nhiên, hãy luôn nhớ rằng việc đó sẽ khiến nó trở nên khó hiểu hơn.
Tạo constraints bằng cách sử dụng Layout Anchors
Việc đầu tiên chúng ta cần làm là phải set thuộc tính translatesAutoresizingMaskIntoConstraints thành false. Việc này nhằm mục đích ngăn các view’s auto-resizing mask được chuyển sang Auto Layout constraints và ảnh hưởng đến các constraint của bạn.
Tiếp theo chúng ta sẽ bắt đầu tạo một mảng chứa các constraints. Trong mảng này chúng ta sẽ định nghĩa các constraints mà bạn muốn gắn nó cho một View nào đó.
Trong trường hợp này mình muốn tạo ra 1 view màu đỏ có width = 200, height = 200 và căn giữa màn hình. Vì vậy mình sẽ tạo ra một mảng constraint như dưới đây.
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view.
let myView = UIView(frame: CGRect(x: 100, y: 100, width: 50, height: 50))
myView.backgroundColor = .red
view.addSubview(myView)
myView.translatesAutoresizingMaskIntoConstraints = false
let constraints = [
myView.centerYAnchor.constraint(equalTo: view!.centerYAnchor),
myView.centerXAnchor.constraint(equalTo: view!.centerXAnchor),
myView.widthAnchor.constraint(equalToConstant: 200.0),
myView.heightAnchor.constraint(equalToConstant: 200.0)
]
NSLayoutConstraint.activate(constraints)
}
Đó là những dòng code cơ bản để có thể tạo constraint cho 1 view bằng code. Và nó khá dễ đọc và dễ hiểu. Dòng cuối cùng của đoạn code trên nhằm mục đích active một chuỗi các constraint mà bạn tạo ra.
Kết quả chúng ta thu được sẽ như hình dưới đây:
UIView cung cấp cho chúng ta 1 tập hợp các thuộc tính neo cho phép bạn thiết lập các mối quan hệ giữa các View với nhau. Dưới đây là danh sách các thuộc tính neo:
extension UIView {
/* Constraint creation conveniences. See NSLayoutAnchor.h for details.
*/
@available(iOS 9.0, *)
open var leadingAnchor: NSLayoutXAxisAnchor { get }
@available(iOS 9.0, *)
open var trailingAnchor: NSLayoutXAxisAnchor { get }
@available(iOS 9.0, *)
open var leftAnchor: NSLayoutXAxisAnchor { get }
@available(iOS 9.0, *)
open var rightAnchor: NSLayoutXAxisAnchor { get }
@available(iOS 9.0, *)
open var topAnchor: NSLayoutYAxisAnchor { get }
@available(iOS 9.0, *)
open var bottomAnchor: NSLayoutYAxisAnchor { get }
@available(iOS 9.0, *)
open var widthAnchor: NSLayoutDimension { get }
@available(iOS 9.0, *)
open var heightAnchor: NSLayoutDimension { get }
@available(iOS 9.0, *)
open var centerXAnchor: NSLayoutXAxisAnchor { get }
@available(iOS 9.0, *)
open var centerYAnchor: NSLayoutYAxisAnchor { get }
@available(iOS 9.0, *)
open var firstBaselineAnchor: NSLayoutYAxisAnchor { get }
@available(iOS 9.0, *)
open var lastBaselineAnchor: NSLayoutYAxisAnchor { get }
}
Với mỗi Anchor nó trả về các subclass từ NSLayoutAnchor đi kèm với một số phương thức phổ biến để thiết lập mối quan hệ. Nó bao gồm =, >, <, >=, <=. Cũng giống như khi chúng ta dùng Interface Builder. Đây là tài liệu để làm quen về nó Documents.
NOTE: Như các bạn đã thấy, nó chỉ hỗ trợ từ iOS 9, Hầu hết các ứng dụng bây giờ đều sẽ hỗ trợ từ iOS 9 trở lên, vì vậy chúng ta không cần phải lo lắng quá nhiều về nó.
Order of constraints
Khi các bạn bắt đầu viết các constraint của mình, điều quan trọng là bạn phải nhớ thứ tự của các constraint của bạn khi bạn làm việc với các constants.
Mục đích của đoạn code này là tạo ra mảng constraint cho innerView sao cho top của nó bằng top của của outerView, bot = bot của outerView, và cách đều 2 bên = 40 points. let constraints = [ innerView.topAnchor.constraint(equalTo: outerView.topAnchor), innerView.leftAnchor.constraint(equalTo: outerView.leftAnchor, constant: 40), innerView.bottomAnchor.constraint(equalTo: outerView.bottomAnchor), innerView.rightAnchor.constraint(equalTo: outerView.rightAnchor, constant: -40) ] Ở dòng được highlight kia giá trị của constant phải = -40. Vì lúc này vị trí của outerView đã được cố định, và outerView.rightAnchor lúc này được coi là gốc(Ox) của trục tọa độ. Vậy nên vị trí bên trái của gốc tọa độ Ox ta hiểu là giá trị âm và ngược lại. Ta có thể đổi ngược lại vị trí của innerView và outerView để được kết quả như trên hình theo đoạn có dưới đây:( Không nên viết theo kiểu này mà nên viết theo kiểu trên để dễ nhìn hơn) outerView.rightAnchor.constraint(equalTo: innerView.rightAnchor, constant: 40)
Tương tự đối với top và bottom là trục Oy hướng xuống dưới, phía trên của gốc Oy là giá trị âm và ngược lại.
NOTE: Khi các bạn thêm các constraint bằng code, các bạn nên làm theo thứ tự để code của ta dễ kiểm soát và dễ hiểu hơn.
Một số Layout guides hay sử dụng
UIVew cũng có một vài Layout guides có thể được sử dụng làm neo như sau:
layoutMarginGuide: Đặt các constraint và giữ lề của layout bằng 1 khoảng trống cố định (16 points).
readableContentGuide: Constraints chiều rộng của view giúp người dùng dễ đọc.
safeAreaLayoutGuide: Đặt các constraint giúp view của bạn không bị che bởi các thanh và nội dung khác.
Đây là đoạn code mẫu sử dụng Layout guide(safeAreaLayoutGuide)
Có vẻ rất rõ ràng khi bạn sử dụng leftAnchor và rightAnchor, nhưng bạn sẽ phải nghĩ về việc sử dụng leadingAnchor và trailingAnchor để thay thế. Nhằm mục đích hỗ trợ các ngôn ngữ từ phải qua trái. Điều này rất quan trọng khi sử dụng views như là Labels trong trường hợp bạn muốn chúng được lật lại cho các ngôn ngữ từ phải sang trái.
Mình hi vọng bài viết này giúp các bạn có thể tạo các constraints bằng code sẽ dễ dàng hơn. Nó sẽ là một giải pháp thay thế tuyệt vời để thiết lâp các constraints khi bạn không muốn dùng Interface Builder.
Bạn có gặp khó khăn trong việc xây dựng môi trường phát triển ứng dụng không? Mỗi khi bắt đầu một dự án mới, việc cài đặt môi trường phát triển thường tốn khá nhiều thời gian. Do đó việc xây dựng và chia sẻ môi trường phát triển giữa các thành viên trong dự án là thực sự cần thiết. Trong bài viết này tôi sẽ chia sẻ với các bạn cách mà tôi đã làm với các dự án của mình.
Tại sao lại sử dụng docker?
Chia sẻ qua một chút về quá trình trước khi tôi sử dụng docker trong các dự án của mình. Năm 2014, tôi bắt đầu xây dựng môi trường phát triển cho dự án sử dụng vargrant. Với vargrant tôi đã có thể đóng gói các dịch vụ được sử dụng trong dự án và chia sẻ với các thành viên trong dự án. Nhưng bạn biết không dự án có sử dụng PostgreSQL và Couchbase, khi đó tôi đã tạo 2 máy ảo vagrant cho các dịch vụ này. Bạn biết đấy, vargrant sẽ tạo ra một máy ảo hoàn chỉnh và sau đó tôi cài các dịch vụ tôi cần sử dụng lên đó. Nó thực sự là vấn đề với chiếc PC của tôi :). Ngoài ra bạn sẽ gặp khó khăn trong việc kết nối các máy ảo này với nhau nữa, bạn cần setting sao cho chúng ở cùng một mạng riêng.
Khi sử dụng docker thì sao? Với mỗi dịch vụ bạn tạo ra một container riêng giống như một máy ảo vargrant tôi đã tạo ở trên. Nhưng điểm khác biệt là gì?
Với docker bạn không cần lo lắng về vấn đề cài dịch vụ nữa. Nó đã tự động làm việc đó rồi.
docker không tạo ra một máy ảo hoàn chỉnh với các dịch vụ thừa trong đó. Nó đơn giản tạo ra một môi trường đủ để bạn chạy dịch vụ của mình.
Việc cấu hình mạng riêng và chuyển tiếp cổng vào container cũng được được thiết lập dễ dàng hơn.
Với chừng đó lý do là đủ để tôi chuyến sang sử dụng docker rồi 🙂
Sử dụng docker-compose.yml để cấu hình các dịch vụ bạn muốn sử dụng trong ứng dụng
Trong phạm vi bài viết này tôi sẽ cấu hình để tạo ra hai container cho các dịch vụ MySQL và DynamoDB.
docker-compose.yml
version: '3'
services:
mysql:
image: mysql:5.7.26
# set hostname để bạn có thể access vào container bằng tên này
container_name: mysql
ports:
# Cấu hình forward port từ host vào docker container
- '3306:3306'
volumes:
# Cấu hình thư mục chưa schema bạn muốn import vào MySQL
- ./mysql/initdb.d:/docker-entrypoint-initdb.d
# mount thư mục MySQL data để có thể backup dữ liệu nếu cần
- ./mysql/data:/var/lib/mysql
# Các cấu hình bạn cần thay đổi cho dịch vụ MySQL
- ./mysql/conf.d/my.cnf:/etc/mysql/my.cnf
# mount thư mục log để trace lỗi nếu cần
- ./mysql/log:/var/log/mysql
# các biến môi trường sử dụng qua tham số -e khi run container hoặc cấu hình trong .env như bên dưới
environment:
# set mật khẩu cho tài khoản root
- MYSQL_ROOT_PASSWORD=$DB_ROOT_PASSWORD
# tên database bạn muốn tạo sau khi container được khởi động
- MYSQL_DATABASE=$DB_DATABASE
# tạo thêm một user mới với tên được cấu hình trong $MYSQL_USER
- MYSQL_USER=$DB_USER
# set mật khẩu cho user được tạo ở trên
- MYSQL_PASSWORD=$DB_PASSWORD
dynamodb:
image: amazon/dynamodb-local
# set hostname để bạn có thể access vào container bằng tên này
container_name: dynamodb
ports:
# Cấu hình forward port từ host vào docker container
- '8000:8000'
volumes:
# mount thử mục data của DynamoDB để có thể backup
- ./dynamodb/data:/home/dynamodblocal/data
entrypoint: java
command: '-jar DynamoDBLocal.jar -sharedDb -dbPath /home/dynamodblocal/data'
Set các biến môi trường bằng .env
Tại thư mục chưa docker-compose.yml bạn tạo .env như sau:
Nearly every VCS has some form of branching support. Branching means you diverge from the main line of development and continue to do work without messing with that main line. In many VCS tools, this is a somewhat expensive process, often requiring you to create a new copy of your source code directory, which can take a long time for large projects.
Some people refer to the branching model in Git as its "killer feature," and it certainly sets Git apart in the VCS community. Why is it so special? The way Git branches is incredibly lightweight, making branching operations nearly instantaneous and switching back and forth between branches generally just as fast. Unlike many other VCSs, Git encourages a workflow that branches and merges often, even multiple times in a day. Understanding and mastering this feature gives you a powerful and unique tool and can literally change the way that you develop.
What a Branch Is
To really understand the way Git does branching, we need to take a step back and examine how Git stores its data. As you may remember from Chapter 1, Git doesn’t store data as a series of changesets or deltas, but instead as a series of snapshots.
When you commit in Git, Git stores a commit object that contains a pointer to the snapshotofthecontentyoustaged,theauthorandmessagemetadata,andzeroormore pointers to the commit or commits that were the direct parents of this commit: zero parents for the first commit, one parent for a normal commit, and multiple parents for a commit that results from a merge of two or more branches.
To visualize this, let’s assume that you have a directory containing three files, and youstagethemallandcommit. Stagingthefileschecksumseachone(theSHA–1hash we mentioned in Chapter 1), stores that version of the file in the Git repository (Git refers to them as blobs), and adds that checksum to the staging area:
$ git add README test.rb LICENSE2
$ git commit -m 'initial commit of my project'
When you create the commit by running git commit, Git checksums each subdirectory (in this case, just the root project directory) and stores those tree objects in the
Git repository. Git then creates a commit object that has the metadata and a pointer to the root project tree so it can re-create that snapshot when needed.
Your Git repository now contains five objects: one blob for the contents of each of your three files, one tree that lists the contents of the directory and specifies which file names are stored as which blobs, and one commit with the pointer to that root tree and all the commit metadata. Conceptually, the data in your Git repository looks something like Figure 3.1.
Figure 3.1: Single commit repository data
If you make some changes and commit again, the next commit stores a pointer to the commit that came im../mediately before it. After two more commits, your history might look something like Figure 3.2.
Figure 3.2: Git object data for multiple commits
A branch in Git is simply a lightweight movable pointer to one of these commits.The default branch name in Git is master. As you initially make commits, you’re given a master branch that points to the last commit you made. Every time you commit, it moves forward automatically.
What happens if you create a new branch? Well, doing so creates a new pointer for you to move around. Let’s say you create a new branch called testing. You do this with the git branch command:
$ git branch testing
Figure 3.3: Branch pointing into the commit data’s history
Figure 3.4: Multiple branches pointing into the commit’s data history
Figure 3.5: HEAD file pointing to the branch you’re on
To switch to an existing branch, you run the git checkout command. Let’s switch to the new testing branch:
$ git checkout testing
This moves HEAD to point to the testing branch (see Figure 3.6).
Figure 3.6: HEAD points to another branch when you switch branches.
What is the significance of that? Well, let’s do another commit:
$ vim test.rb $ git commit -a -m 'made a change'
Figure 3.7: The branch that HEAD points to moves forward with each commit.
This is interesting, because now your testing branch has moved forward, but your master branch still points to the commit you were on when you ran git checkout to switch branches. Let’s switch back to the master branch:
$ git checkout master
Figure 3.8: HEAD moves to another branch on a checkout.
Figure 3.8 shows the result. That command did two things. It moved the HEAD pointer back to point to the master branch, and it reverted the files in your working directory back to the snapshot that master points to. This also means the changes you make from this point forward will diverge from an older version of the project. It essentially rewinds the work you’ve done in your testing branch temporarily so you can go in a different direction.
Let’s make a few changes and commit again:
$ vim test.rb
$ git commit -a -m 'made other changes'
Now your project history has diverged (see Figure 3.9). You created and switched to a branch, did some work on it, and then switched back to your main branch and did other work. Both of those changes are isolated in separate branches: you can switch back and forth between the branches and merge them together when you’re ready. And you did all that with simple branch and checkout commands.
Figure 3.9: The branch histories have diverged.
Because a branch in Git is in actuality a simple file that contains the 40 character SHA–1 checksum of the commit it points to, branches are cheap to create and destroy. Creating a new branch is as quick and simple as writing 41 bytes to a file (40 characters
and a newline).
This is in sharp contrast to the way most VCS tools branch, which involves copying all of the project’s files into a second directory. This can take several seconds or even minutes, depending on the size of the project, whereas in Git the process is always instantaneous. Also, because we’re recording the parents when we commit, finding a proper merge base for merging is automatically done for us and is generally very easy to do. These features help encourage developers to create and use branches often.
Let’s see why you should do so.
Basic Branching and Merging
Let’s go through a simple example of branching and merging with a workflow that you might use in the real world. You’ll follow these steps:
Do work on a web site.
Create a branch for a new story you’re working on.
Do some work in that branch.
At this stage, you’ll receive a call that another issue is critical and you need a hotfix. You’ll do the following:
Revert back to your production branch.
Create a branch to add the hotfix.
After it’s tested, merge the hotfix branch, and push to production.
Switch back to your original story and continue working.
Basic Branching
First, let’s say you’re working on your project and have a couple of commits already (see Figure 3.10).
Figure 3.10: A short and simple commit history
You’ve decided that you’re going to work on issue #53 in whatever issue-tracking system your company uses. To be clear, Git isn’t tied into any particular issue-tracking system; but because issue #53 is a focused topic that you want to work on, you’ll create a new branch in which to work. To create a branch and switch to it at the same time, you can run the git checkout command with the -b switch:
$ git checkout -b iss53
Switched to a new branch "iss53"
This is shorthand for
$ git branch iss53
$ git checkout iss53
Figure 3.11: Creating a new branch pointer
You work on your web site and do some commits. Doing so moves the iss53 branch forward, because you have it checked out (that is, your HEAD is pointing to it; see Figure 3.12):
$ vim index.html
$ git commit -a -m 'added a new footer [issue 53]'
Figure 3.12: The iss53 branch has moved forward with your work.
Now you get the call that there is an issue with the web site, and you need to fix it im../mediately. im../mediately. With Git, you don’t have to deploy your fix along with the iss53 changes you’ve made, and you don’t have to put a lot of effort into reverting those changes before you can work on applying your fix to what is in production. All you have to do is switch back to your master branch.
However, before you do that, note that if your working directory or staging area has uncommitted changes that conflict with the branch you’re checking out, Git won’t let you switch branches. It’s best to have a clean working state when you switch branches. There are ways to get around this (namely, stashing and commit amending) that we’ll cover later. For now, you’ve committed all your changes, so you can switch back to your master branch:
$ git checkout master
Switched to branch "master"
At this point, your project working directory is exactly the way it was before you started working on issue #53, and you can concentrate on your hotfix. This is an important point to remember: Git resets your working directory to look like the snapshot of the commit that the branch you check out points to. It adds, removes, and modifies files automatically to make sure your working copy is what the branch looked like on your last commit to it.
Next, you have a hotfix to make. Let’s create a hotfix branch on which to work unti it’s completed (see Figure 3.13):
$ git checkout -b ’hotfix’
Switched to a new branch "hotfix"
$ vim index.html
$ git commit -a -m ’fixed the broken email address’
[hotfix]: created 3a0874c: "fixed the broken email address"
1 files changed, 0 insertions(+), 1 deletions(-)
Figure 3.13: hotfix branch based back at your master branch point
You can run your tests, make sure the hotfix is what you want, and merge it back into your master branch to deploy to production. You do this with the git merge command:
You’ll notice the phrase "Fast forward" in that merge. Because the commit pointed to by the branch you merged in was directly upstream of the commit you’re on, Git moves the pointer forward. To phrase that another way, when you try to merge one commit with a commit that can be reached by following the first commit’s history, Git simplifies things by moving the pointer forward because there is no divergent work to merge together — this is called a "fast forward".
Your change is now in the snapshot of the commit pointed to by the master branch, and you can deploy your change (see Figure 3.14).
After that your super-important fix is deployed, you’re ready to switch back to the work you were doing before you were interrupted. However, first you’ll delete the
Figure 3.14: Your master branch points to the same place as your hotfix branch after the merge.
hotfix branch, because you no longer need it — the master branch points at the same place. You can delete it with the -d option to git branch:
Now you can switch back to your work-in-progress branch on issue #53 and continue working on it (see Figure 3.15):
$ git checkout iss53
Switched to branch "iss53"
$ vim index.html
$ git commit -a -m ’finished the new footer [issue 53]’
[iss53]: created ad82d7a: "finished the new footer [issue 53]"
1 files changed, 1 insertions(+), 0 deletions(-)
Figure 3.15: Your iss5 branch can move forward independently.
It’s worth noting here that the work you did in your hotfix branch is not contained in the files in your iss53 branch. If you need to pull it in, you can merge your master branch into your iss53 branch by running git merge master, or you can wait to integrate those changes until you decide to pull the iss53 branch back into master later.
Basic Merging
Suppose you’ve decided that your issue #53 work is complete and ready to be merged into your master branch. In order to do that, you’ll merge in your iss53 branch, much like you merged in your hotfix branch earlier. All you have to do is check out the branch you wish to merge into and then run the git merge command:
This looks a bit different than the hotfix merge you did earlier. In this case, your development history has diverged from some older point. Because the commit on the branch you’re on isn’t a direct ancestor of the branch you’re merging in, Git has to do some work. In this case, Git does a simple three-way merge, using the two snapshots pointed to by the branch tips and the common ancestor of the two. Figure 3.16 highlights the three snapshots that Git uses to do its merge in this case.
Figure 3.16: Git automatically identifies the best common-ancestor merge base for branch merging.
Instead of just moving the branch pointer forward, Git creates a new snapshot that results from this three-way merge and automatically creates a new commit that points to it (see Figure 3.17). This is referred to as a merge commit and is special in that it has more than one parent. It’s worth pointing out that Git determines the best common ancestor to use for its merge base; this is different than CVS or Subversion (before version 1.5), where the developer doing the merge has to figure out the best merge base for themselves. This makes merging a heck of a lot easier in Git than in these other systems. Now that your work is merged in, you have no further need for the iss53 branch. You can delete it and then manually close the ticket in your ticket-tracking system:
Figure 3.17: Git automatically creates a new commit object that contains the merged work.
$ git branch -d iss53
Basic Merge Conflicts
Occasionally, this process doesn’t go smoothly. If you changed the same part of the same file differently in the two branches you’re merging together, Git won’t be able to merge them cleanly. If your fix for issue #53 modified the same part of a file as the hotfix, you’ll get a merge conflict that looks something like this:
$ git merge iss53
Auto-merging index.html
CONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.
Git hasn’t automatically created a new merge commit. It has paused the process while you resolve the conflict. If you want to see which files are unmerged at any point after a merge conflict, you can run git status:
[master*]$ git status index.html: needs merge
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# unmerged: index.html
#
Anything that has merge conflicts and hasn’t been resolved is listed as unmerged. Git adds standard conflict-resolution markers to the files that have conflicts, so you can open them manually and resolve those conflicts. Your file contains a section that looks something like this:
<<<<<<< HEAD:index.html
<div id="footer">contact : [email protected]</div>
=======
<div id="footer">
please contact us at [email protected]
</div> >>>>>>> iss53:index.html
This means the version in HEAD (your master branch, because that was what you had checked out when you ran your merge command) is the top part of that block (everything above the =======), while the version in your iss53 branch looks like everything in the bottom part. In order to resolve the conflict, you have to either choose one side or the other or merge the contents yourself. For instance, you might resolve this conflict by replacing the entire block with this:
This resolution has a little of each section, and I’ve fully removed the <<<<<<<, =======, and >>>>>>> lines. After you’ve resolved each of these sections in each conflicted file, run git add on each file to mark it as resolved. Staging the file marks it as resolved in Git. If you want to use a graphical tool to resolve these issues, you can run git mergetool, which fires up an appropriate visual merge tool and walks you through the conflicts:
$ git mergetool
merge tool candidates: kdiff3 tkdiff xxdiff meld gvimdiff opendiff emerge vimdiff
Merging the files: index.html
Normal merge conflict for ’index.html’:
{local}: modified
{remote}: modified
Hit return to start merge resolution tool (opendiff):
If you want to use a merge tool other than the default (Git chose opendiff for me in this case because I ran the command on a Mac), you can see all the supported tools listed at the top after "merge tool candidates". Type the name of the tool you’d rather use. In Chapter 7, we’ll discuss how you can change this default value for your environment.
After you exit the merge tool, Git asks you if the merge was successful. If you tell the script that it was, it stages the file to mark it as resolved for you.
You can run git status again to verify that all conflicts have been resolved:
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: index.html
#
If you’re happy with that, and you verify that everything that had conflicts has been staged, you can type git commit to finalize the merge commit. The commit message by default looks something like this:
Merge branch 'iss53'
Conflicts:
index.html
#
# It looks like you may be committing a MERGE.
# If this is not correct, please remove the file
# .git/MERGE_HEAD
# and try again.
#
You can modify that message with details about how you resolved the merge if you think it would be helpful to others looking at this merge in the future — why you did what you did, if it’s not obvious.
Branch Management
Now that you’ve created, merged, and deleted some branches, let’s look at some branch management tools that will come in handy when you begin using branches all the time.
The git branch command does more than just create and delete branches. If you run it with no arguments, you get a simple listing of your current branches:
$ git branch
iss53
* master
testing
Notice the * character that prefixes the master branch: it indicates the branch that you currently have checked out. This means that if you commit at this point, the master branch will be moved forward with your new work. To see the last commit on each branch, you can run git branch -v:
$ git branch -v
iss53 93b412c fix javascript issue
* master 7a98805 Merge branch ’iss53’
testing 782fd34 add scott to the author list in the readmes
Another useful option to figure out what state your branches are in is to filter this list to branches that you have or have not yet merged into the branch you’re currently on. The useful --merged and --no-merged options have been available in Git since version 1.5.6 for this purpose. To see which branches are already merged into the branch you’re on, you can run git branch merged:
$ git branch --merged
iss53
* master
Because you already merged in iss53 earlier, you see it in your list. Branches on this list without the * in front of them are generally fine to delete with git branch -d; you’ve already incorporated their work into another branch, so you’re not going to lose anything.
To see all the branches that contain work you haven’t yet merged in, you can run git branch --no-merged:
$ git branch --no-merged
testing
This shows your other branch. Because it contains work that isn’t merged in yet, trying to delete it with git branch -d will fail:
$ git branch -d testing
error: The branch 'testing' is not an ancestor of your current HEAD.
If you are sure you want to delete it, run git branch -D testing. If you really do want to delete the branch and lose that work, you can force it with -D, as the helpful message points out.
Branching Work flows
Now that you have the basics of branching and merging down, what can or should you do with them? In this section, we’ll cover some common workflows that this lightweight branching makes possible, so you can decide if you would like to incorporate it into your own development cycle.
Long-Running Branches
Because Git uses a simple three-way merge, merging from one branch into another multiple times over a long period is generally easy to do. This means you can have several branches that are always open and that you use for different stages of your development cycle; you can merge regularly from some of them into others.
Many Git developers have a workflow that embraces this approach, such as having only code that is entirely stable in their master branch — possibly only code that has been or will be released. They have another parallel branch named develop or next that they work from or use to test stability — it isn’t necessarily always stable, but whenever it gets to a stable state, it can be merged into master. It’s used to pull in topic branches (short-lived branches, like your earlier iss53 branch) when they’re ready, to make sure they pass all the tests and don’t introduce bugs.
In reality, we’re talking about pointers moving up the line of commits you’re making. The stable branches are farther down the line in your commit history, and the bleeding-edge branches are farther up the history (see Figure 3.18).
Figure 3.18: More stable branches are generally farther down the commit history.
It’s generally easier to think about them as work silos, where sets of commits graduate to a more stable silo when they’re fully tested (see Figure 3.19).
You can keep doing this for several levels of stability. Some larger projects also havea proposed or pu (proposed updates) branch that has integrated branches that may not be ready to go into the next or master branch. The idea is that your branches are at various levels of stability; when they reach a more stable level, they’re merged into the branch above them. Again, having multiple long-running branches isn’t necessary, but it’s often helpful, especially when you’re dealing with very large or complex projects.
Figure 3.19: It may be helpful to think of your branches as silos.
Topic Branches
Topic branches, however, are useful in projects of any size. A topic branch is a short-lived branch that you create and use for a single particular feature or related work. This is something you’ve likely never done with a VCS before because it’s generally too expensive to create and merge branches. But in Git it’s common to create, work on, merge, and delete branches several times a day.
You saw this in the last section with the iss53 and hotfix branches you created. You did a few commits on them and deleted them directly after merging them into your main branch. This technique allows you to context-switch quickly and completely — because your work is separated into silos where all the changes in that branch have to do with that topic, it’s easier to see what has happened during code review and such. You can keep the changes there for minutes, days, or months, and merge them in when they’re ready, regardless of the order in which they were created or worked on.
Consider an example of doing some work (on master), branching off for an issue (iss91), working on it for a bit, branching off the second branch to try another way of handling the same thing (iss91v2), going back to your master branch and working there for a while, and then branching off there to do some work that you’re not sure is a good idea (dumbidea branch). Your commit history will look something like Figure 3.20.
Now, let’s say you decide you like the second solution to your issue best (iss91v2); and you showed the dumbidea branch to your coworkers, and it turns out to be genius. You can throw away the original iss91 branch (losing commits C5 and C6) and merge in the other two. Your history then looks like Figure 3.21.
It’s important to remember when you’re doing all this that these branches are completely local. When you’re branching and merging, everything is being done only in your Git repository — no server communication is happening.
Figure 3.20: Your commit history with multiple topic branches
Figure 3.21: Your history after merging in dumbidea and iss91v2
Remote Branches
Remote branches are references to the state of branches on your remote repositories. They’re local branches that you can’t move; they’re moved automatically whenever you do any network communication. Remote branches act as bookmarks to remind you where the branches on your remote repositories were the last time you connected to them.
They take the form (remote)/(branch). For instance, if you wanted to see what the master branch on your origin remote looked like as of the last time you communicated with it, you would check the origin/master branch. If you were working on an issue with a partner and they pushed up an iss53 branch, you might have your own local iss53 branch; but the branch on the server would point to the commit at origin/iss53.
This may be a bit confusing, so let’s look at an example. Let’s say you have a Git server on your network at git.ourcompany.com. If you clone from this, Git automatically names it origin for you, pulls down all its data, creates a pointer to where its master branch is, and names it origin/master locally; and you can’t move it. Git also gives you your own master branch starting at the same place as origin’s master branch, so you have something to work from (see Figure 3.22).
Figure 3.22: A Git clone gives you your own master branch and origin/master pointing to origin’s master branch.
If you do some work on your local master branch, and, in the meantime, someone else pushes to git.ourcompany.com and updates its master branch, then your histories move forward differently. Also, as long as you stay out of contact with your origin server, your origin/master pointer doesn’t move (see Figure 3.23).
To synchronize your work, you run a git fetch origin command. This command looks up which server origin is (in this case, it’s git.ourcompany.com), fetches any data from it that you don’t yet have, and updates your local database, moving your origin/master pointer to its new, more up-to-date position (see Figure 3.24).
To demonstrate having multiple remote servers and what remote branches for those remote projects look like, let’s assume you have another internal Git server that is used only for development by one of your sprint teams. This server is at git.team1.ourcompany.com. You can add it as a new remote reference to the project you’re currently working on by running the git remote add command as we covered in Chapter 2. Name this remote teamone, which will be your shortname for that whole URL (see Figure 3.25).
Figure 3.23: Working locally and having someone push to your remote server makes each history move forward differently.
Figure 3.24: The git fetch command updates your remote references.
Now, you can run git fetch teamone to fetch everything server has that you don’t have yet. Because that server is a subset of the data your origin server has right now, Git fetches no data but sets a remote branch called teamone/master to point to the commit that teamone has as its master branch (see Figure 3.26).
Figure 3.25: Adding another server as a remote
Figure 3.26: You get a reference to teamone’s master branch position locally.
Pushing
When you want to share a branch with the world, you need to push it up to a remote that you have write access to. Your local branches aren’t automatically synchronized to the remotes you write to — you have to explicitly push the branches you want to share. That way, you can use private branches do work you don’t want to share, and push up only the topic branches you want to collaborate on.
If you have a branch named serverfix that you want to work on with others, you can push it up the same way you pushed your first branch. Run git push (remote) (branch):
This is a bit of a shortcut. Git automatically expands the serverfix branchname out to refs/heads/serverfix:refs/heads/serverfix, which means, "Take my serverfix local branch and push it to update the remote’s serverfix branch." We’ll go over the refs/heads/ part in detail in Chapter 9, but you can generally leave it off. You can also do git push origin serverfix:serverfix, which does the same thing — it says, "Take my serverfix and make it the remote’s serverfix." You can use this format to push a local branch into a remote branch that is named differently. If you didn’t want it to be called serverfix on the remote, you could instead run git push origin serverfix:awesomebranch to push your local serverfix branch to the awesomebranch branch on the remote project.
The next time one of your collaborators fetches from the server, they will get a reference to where the server’s version of serverfix is under the remote branch origin/serverfix:
It’s important to note that when you do a fetch that brings down new remote branches, you don’t automatically have local, editable copies of them. In other words, inthiscase,you don’t have a new serverfix branch — you only have an origin/serverfix pointer that you can’t modify.
To merge this work into your current working branch, you can run git merge origin/serverfix. If you want your own serverfix branch that you can work on, you can base it off your remote branch:
$ git checkout -b serverfix origin/serverfix
Branch serverfix set up to track remote branch refs/remotes/origin/serverfix.
Switched to a new branch "serverfix"
This gives you a local branch that you can work on that starts where origin/serverfix is.
Tracking Branches
Checking out a local branch from a remote branch automatically creates what is called a tracking branch. Tracking branches are local branches that have a direct relationship to a remote branch. If you’re on a tracking branch and type git push, Git automatically knows which server and branch to push to. Also, running git pull while on one of these branches fetches all the remote references and then automatically merges in the corresponding remote branch.
When you clone a repository, it generally automatically creates a master branch that tracks origin/master. That’s why git push and git pull work out of the box with no other arguments. However, you can set up other tracking branches if you wish — ones that don’t track branches on origin and don’t track the master branch. The simple case is the example you just saw, running git checkout -b [branch][remotename]/[branch]. If you have Git version 1.6.2 or later, you can also use the –track shorthand:
$ git checkout --track origin/serverfix Branch serverfix set up to track remote branch refs/remotes/origin/serverfix. Switched to a new branch "serverfix"
To set up a local branch with a different name than the remote branch, you can easily use the first version with a different local branch name:
$ git checkout -b sf origin/serverfix
Branch sf set up to track remote branch refs/remotes/origin/serverfix.
Switched to a new branch "sf"
Now, your local branch sf will automatically push to and pull from origin/serverfix.
Deleting Remote Branches
Suppose you’re done with a remote branch — say, you and your collaborators are finished with a feature and have merged it into your remote’s master branch (or whatever branch your stable codeline is in). You can delete a remote branch using the rather obtuse syntax git push [remotename] :[branch]. If you want to delete your serverfix branch from the server, you run the following:
Boom. No more branch on your server. You may want to dog – ear this page, because you’ll need that command,and you’ll likely forget the syntax. Away to remember this command is by recalling the git push [remotename] [localbranch]:[remotebranch] syntax that we went over a bit earlier. If you leave off the [localbranch] portion, then you’re basically saying, "Take nothing on my side and make it be [remotebranch]."
Rebasing
In Git, there are two main ways to integrate changes from one branch into another: the merge and the rebase. In this section you’ll learn what rebasing is, how to do it, why it’s a pretty amazing tool, and in what cases you won’t want to use it.
Figure 3.27: Your initial diverged commit history
The Basic Rebase
If you go back to an earlier example from the Merge section (see Figure 3.27), you can see that you diverged your work and made commits on two different branches.
The easiest way to integrate the branches, as we’ve already covered, is the merge command. It performs a three-way merge between the two latest branch snapshots (C3 and C4) and the most recent common ancestor of the two (C2), creating a new snapshot (and commit), as shown in Figure 3.28.
Figure 3.28: Merging a branch to integrate the diverged work history
However, there is another way: you can take the patch of the change that was introduced in C3 and reapply it on top of C4. In Git, this is called rebasing. With the rebase command, you can take all the changes that were committed on one branch and replay them on another one. In this example, you’d run the following:
$ git checkout experiment
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: added staged command
It works by going to the common ancestor of the two branches (the one you’re on and the one you’re rebasing onto), getting the diff introduced by each commit of the branch you’re on, saving those diffs to temporary files, resetting the current branch to the same commit as the branch you are rebasing onto, and finally applying each change in turn. Figure 3.29 illustrates this process.
Figure 3.29: Rebasing the change introduced in C3 onto C4
At this point, you can go back to the master branch and do a fast-forward merge (see Figure 3.30).
Figure 3.30: Fast-forwarding the master branch
Now, the snapshot pointedto by C3 is exactly thesame as theone that waspointed to by C5 in the merge example. There is no difference in the end product of the integration, but rebasing makes for a cleaner history. If you examine the log of a rebased branch, it looks like a linear history: it appears that all the work happened in series, even when it originally happened in parallel.
Often, you’ll do this to make sure your commits apply cleanly on a remote branch — perhaps in a project to which you’re trying to contribute but that you don’t maintain. In this case, you’d do your work in a branch and then rebase your work onto origin/master when you were ready to submit your patches to the main project. That way, the maintainer doesn’t have to do any integration work — just a fast-forward or a clean apply.
Note that the snapshot pointed to by the final commit you end up with, whether it’s the last of the rebased commits for a rebase or the final merge commit after a merge, is the same snapshot — it’s only the history that is different. Rebasing replays changes from one line of work onto another in the order they were introduced, whereas merging takes the endpoints and merges them together.
More Interesting Rebases
You can also have your rebase replay on something other than the rebase branch. Take a history like Figure 3.31, for example. You branched a topic branch (server) to add some server-side functionality to your project, and made a commit. Then, you branched off that to make the client-side changes (client) and committed a few times. Finally, you went back to your server branch and did a few more commits.
Figure 3.31: A history with a topic branch off another topic branch
Suppose you decide that you want to merge your client-side changes into your mainline for a release, but you want to hold off on the server-side changes until it’s tested further. You can take the changes on client that aren’t on server (C8 and C9) and replay them on your master branch by using the --onto option of git rebase:
$ git rebase --onto master server client
This basically says, "Check out the client branch, figure out the patches from the common ancestor of the client and server branches, and then replay them onto master." It’s a bit complex; but the result, shown in Figure 3.32, is pretty cool.
Figure 3.32: Rebasing a topic branch off another topic branch
Now you can fast-forward your master branch (see Figure 3.33):
$ git checkout master
$ git merge client
Figure 3.33: Fast-forwarding your master branch to include the client branch changes
Then, you can fast-forward the base branch (master):
$ git checkout master
$ git merge server
You can remove the client and server branches because all the work is integrated and you don’t need them anymore, leaving your history for this entire process looking like Figure 3.35:
$ git branch -d client
$ git branch -d server
Figure 3.35: Final commit history
The Perils of Rebasing
Ahh, but the bliss of rebasing isn’t without its drawbacks, which can be summed up in a single line:
Do not rebase commits that you have pushed to a public repository.
If you follow that guideline, you’ll be fine. If you don’t, people will hate you, and you’ll be scorned by friends and family.
When you rebase stuff, you’re abandoning existing commits and creating new ones that are similar but different. If you push commits somewhere and others pull them down and base work on them, and then you rewrite those commits with git rebase and push them up again, your collaborators will have to re-merge their work and things will get messy when you try to pull their work back into yours.
Let’s look at an example of how rebasing work that you’ve made public can cause problems. Suppose you clone from a central server and then do some work off that. Your commit history looks like Figure 3.36.
Figure 3.36: Clone a repository, and base some work on it.
Now, someone else does more work that includes a merge, and pushes that work to the central server. You fetch them and merge the new remote branch into your work, making your history look something like Figure 3.37.
Next, the person who pushed the merged work decides to go back and rebase their work instead; they do a git push --force to overwrite the history on the server. You then fetch from that server, bringing down the new commits.
At this point, you have to merge this work in again, even though you’ve already done so. Rebasing changes the SHA–1 hashes of these commits so to Git they look like new commits, when in fact you already have the C4 work in your history (see Figure 3.39).
You have to merge that work in at some point so you can keep up with the other developer in the future. After you do that, your commit history will contain both the C4 and C4’ commits, which have different SHA–1 hashes but introduce the same work and have the same commit message. If you run a git log when your history looks like this, you’ll see two commits that have the same author date and message, which will be confusing. Furthermore, if you push this history back up to the server, you’ll reintroduce all those rebased commits to the central server, which can further confuse people.
Figure 3.37: Fetch more commits, and merge them into your work.
Figure 3.38: Someone pushes rebased commits, abandoning commits you’ve based your work on.
If you treat rebasing as a way to clean up and work with commits before you push them, and if you only rebase commits that have never been available publicly, then you’ll be fine. If you rebase commits that have already been pushed publicly, and people may have based work on those commits, then you may be in for some frustrating trouble.
Figure 3.39: You merge in the same work again into a new merge commit.
Summary
We’ve covered basic branching and merging in Git. You should feel comfortable creating and switching to new branches, switching between branches and merging local branches together. You should also be able to share your branches by pushing them to a shared server, working with others on shared branches and rebasing your branches before they are shared.
You can get a Git project using two main approaches. The first takes an existing project or directory and imports it into Git. The second clones an existing Git repository from another server.
Initializing a Repository in an Existing Directory
If you’re starting to track an existing project in Git, you need to go to the project’s directory and type
$ git init
This creates a new sub directory named .git that contains all of your necessary repository files — a Git repository skeleton. At this point, nothing in your project is tracked yet. (See Chapter 9 for more information about exactly what files are contained in the .git directory you just created.)
If you want to start version-controlling existing files (as opposed to an empty directory), you should probably begin tracking those files and do an initial commit. You can accomplish that with a few git add
commands that specify the files you want to track, followed by a commit:
We’ll go over what these commands do in just a minute. At this point, you have a Git repository with tracked files and an initial commit.
Cloning an Existing Repository
If you want to get a copy of an existing Git repository — for example, a project you’d like to contribute to — the command you need is git clone. If you’re familiar with other VCS systems such as Subversion, you’ll notice that the command is clone and not checkout. This is an important distinction — Git receives a copy of nearly all data that the server has. Every version of every file for the history of the project is pulled down when you run git clone. In fact, if your server disk gets corrupted, you can use any of the clones on any client to set the server back to the state it was in when it was cloned (you may lose some server-side hooks and such, but all the versioned data would be there—see Chapter 4 for more details). You clone a repository with git clone [url]. For example, if you want to clone the Ruby Git library called Grit, you can do so like this:
$ git clone git://github.com/wingadium1/grit.git
That creates a directory named grit, initializes a .git directory inside it, pulls down all the data for that repository, and checks out a working copy of the latest version. If you go into the new grit directory, you’ll see the project files in there, ready to be worked on or used. If you want to clone the repository into a directory named something other than grit, you can specify that as the next command-line option:
That command does the same thing as the previous one, but the target directory is called mygrit. Git has a number of different transfer protocols you can use. The previous example uses the git:// protocol, but you may also see http(s):// or user@server:/path.git, which uses the SSH transfer protocol. Chapter 4 will introduce all of the available options the server can set up to access your Git repository and the pros and cons of each.
Recording Changes to the Repository
You have a bona fide Git repository and a checkout or working copy of the files for that project. You need to make some changes and commit snapshots of those changes into your repository each time the project reaches a state you want to record.
Remember that each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot; they can be unmodified, modified, or staged. Untracked files are everything else – any files in your working directory that were not in your last snapshot and are not in your staging area. When you first clone a repository, all of your files will be tracked and unmodified because you just checked them out and haven’t edited anything.
As you edit files, Git sees them as modified, because you’ve changed them since your last commit. You stage these modified files and then commit all your staged changes, and the cycle repeats. This lifecycle is illustrated in Figure 2.1.
Figure 2.1 The lifecycle of the status of your files
Checking the Status of Your Files
The main tool you use to determine which files are in which state is the git status command. If you run this command directly after a clone, you should see something like this:
$ git status
#On branch master nothing to commit (working directory clean)
This means you have a clean working directory—in other words, there are no tracked and modified files. Git also doesn’t see any untracked files, or they would be listed here. Finally, the command tells you which branch you’re on. For now, that is always master, which is the default; you won’t worry about it there. Then next chapter will go over branches and references in detail. Let’s say you add a new file to your project, a simple README file. If the file didn’t exist before, and you run git status, you see your untracked file like so:
$ vim README
$ git status
# On branch master
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# README
nothing added to commit but untracked files present (use "git add" to track)
You can see that your new README file is untracked, because it’s under the "Untracked files" heading in your status output. Untracked basically means that Git sees a file you didn’t have in the previous snapshot (commit); Git won’t start including it in your commit snapshots until you explicitly tell it to do so. It does this so you don’t accidentally begin including generated binary files or other files that you did not mean to include. You do want to start including README, so let’s start tracking the file.
Tracking New Files
In order to begin tracking a new file, you use the command git add. To begin tracking the README file, you can run this:
$ git add README
If you run your status command again, you can see that your README file is now tracked and staged:
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: README
#
You can tell that it’s staged because it’s under the "Changes to be committed" heading. If you commit at this point, the version of the file at the time you ran git add is what will be in the historical snapshot. You may recall that when you ran git init earlier, you then ran git add (files) — that was to begin tracking files in your directory. The git add command takes a path name for either a file or a directory; if it’s a directory, the command adds all the files in that directory recursively.
Staging Modified Files
Let’s change a file that was already tracked. If you change a previously tracked file called benchmarks.rb and then run your status command again, you get something that looks like this:
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: README
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
#
# modified: benchmarks.rb
#
The benchmarks.rb file appears under a section named "Changed but not updated" — which means that a file that is tracked has been modified in the working directory but not yet staged. To stage it, you run the git add command (it’s a multipurpose command—you use it to begin tracking new files, to stage files, and to do other things like marking merge-conflicted files as resolved). Let’s run git add now to stage the benchmarks.rb file, and then run git status again:
$ git add benchmarks.rb
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: README
# modified: benchmarks.rb
#
Both files are staged and will go into your next commit. At this point, suppose you remember one little change that you want to make in benchmarks.rb before you commit it. You open it again and make that change, and you’re ready to commit. However, let’s run git status one more time:
$ vim benchmarks.rb
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: README
# modified: benchmarks.rb
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
#
# modified: benchmarks.rb
#
What the heck? Now benchmarks.rb is listed as both staged and unstaged. How is that possible? It turns out that Git stages a file exactly as it is when you run the git add command. If you commit now, the version of benchmarks.rb as it was when you last ran the git add command is how it will go into the commit, not the version of the file as it looks in your working directory when you run git commit. If you modify a file after you run git add, you have to run git add again to stage the latest version of the file:
$ git add benchmarks.rb
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: README
# modified: benchmarks.rb
#
Ignoring Files
Often, you’ll have a class of files that you don’t want Git to automatically add or even show you as being untracked. These are generally automatically generated files such as log files or files produced by your build system. In such cases, you can create a file listing patterns to match them named .gitignore. Here is an example .gitignore file:
$ cat .gitignore *.[oa] *˜
The first line tells Git to ignore any files ending in .o or .a — object and archive files that may be the product of building your code. The second line tells Git to ignore all files that end with a tilde ( ), which is used by many text editors such as Emacs to mark temporary files. You may also include a log, tmp, or pid directory; automatically generated documentation; and soon. Setting up a .gitignore file before you get going is generally a good idea so you don’t accidentally commit files that you really don’t want in your Git repository. The rules for the patterns you can put in the .gitignore file are as follows:
Blank lines or lines starting with # are ignored.
Standard glob patterns work.
You can end patterns with a forward slash (/) to specify a directory.
You can negate a pattern by starting it with an exclamation point (!).
Glob patterns are like simplified regular expressions that shells use. An asterisk (*) matches zero or more characters; [abc] matches any character inside the brackets (in this case a, b, or c); a question mark (?) matches a single character; and brackets enclosing characters seperated by a hyphen([0-9]) matches any character between them (in this case 0 through 9).
Here is another example .gitignore file:
# a comment this is ignored
*.a
# no .a files
!lib.a
# but do track lib.a, even though you're ignoring .a files above
/TODO
# only ignore the root TODO file, not subdir/TODO
build/
# ignore all files in the build/ directory
doc/*.txt
# ignore doc/notes.txt, but not doc/server/arch.txt
Viewing Your Staged and Unstaged Changes
If the git status command is too vague for you — you want to know exactly what youchanged,notjustwhichfileswerechanged—youcanusethegit diffcommand. We’ll cover git diff in more detail later; but you’ll probably use it most often to answer these two questions: What have you changed but not yet staged? And what have you staged that you are about to commit? Although git status answers those questions very generally, git diff shows you the exact lines added and removed — the patch, as it were.
Let’s say you edit and stage the README file again and then edit the benchmarks.rb file without staging it. If you run your status command, you once again see something like this:
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: README
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
#
# modified: benchmarks.rb #
To see what you’ve changed but not yet staged, type git diff with no other arguments:
$ git diff
diff --git a/benchmarks.rb b/benchmarks.rb
index 3cb747f..da65585 100644
--- a/benchmarks.rb
+++ b/benchmarks.rb
@@ -36,6 +36,10 @@ def main
@commit.parents[0].parents[0].parents[0]
end
+ run_code(x, 'commits 1') do
+ git.commits.size
+ end
+ run_code(x, 'commits 2') do
log = git.commits('master', 15)
log.size
That command compares what is in your working directory with what is in your staging area. The result tells you the changes you’ve made that you haven’t yet staged.
If you want to see what you’ve staged that will go into your next commit, you can use git diff -cached. (In Git versions 1.6.1 and later, you can also use git diff -staged, which may be easier to remember.) This command compares your staged changes to your last commit:
$ git diff --cached
diff --git a/README b/README
new file mode 100644
index 0000000..03902a1
--- /dev/null
+++ b/README2
@@ -0,0 +1,5 @@
+ grit
+ by Tom Preston-Werner, Chris Wanstrath
+ http://github.com/mojombo/grit
+
+ Grit is a Ruby library for extracting information from a Git repository
It’s important to note that git diff by itself doesn’t show all changes made since your last commit — only changes that are still unstaged. This can be confusing, because if you’ve staged all of your changes, git diff will give you no output.
For another example, if you stage the benchmarks.rb file and then edit it, you can use git diff to see the changes in the file that are staged and the changes that are unstaged:
$ git add benchmarks.rb
$ echo '# test line' >> benchmarks.rb
$ git status
# On branch master
#
# Changes to be committed:
#
# modified: benchmarks.rb
#
# Changed but not updated:
#
# modified: benchmarks.rb
#
Now you can use git diff to see what is still unstaged
$ git diff
diff --git a/benchmarks.rb b/benchmarks.rb
index e445e28..86b2f7c 100644
--- a/benchmarks.rb
+++ b/benchmarks.rb
@@ -127,3 +127,4 @@ end
main()
##pp Grit::GitRuby.cache_client.stats
+
# test line
and git diff –cached to see what youve staged so far:
$ git diff --cached
diff --git a/benchmarks.rb b/benchmarks.rb
index 3cb747f..e445e28 100644
--- a/benchmarks.rb
+++ b/benchmarks.rb
@@ -36,6 +36,10 @@ def main
@commit.parents[0].parents[0].parents[0]
end
+ run_code(x, 'commits 1') do
+ git.commits.size
+ end
+ run_code(x, 'commits 2')
do log = git.commits('master', 15)
log.size
Committing Your Changes
Now that your staging area is setup the way you want it, you can commit your changes. Remember that any thing that is still unstaged — any files you have created or modified that you haven’t run git add on since you edited them — won’t go into this commit.
They will stay as modified files on your disk. In this case, the last time you ran git status, you saw that everything was staged, so you’re ready to commit your changes. The simplest way to commit is to type git commit:
$ git commit
Doing so launches your editor of choice. (This is set by your shell’s $EDITOR environment variable — usually vim or emacs, although you can configure it with whatever you want using the git config --global core.editor command as you saw in Chapter 1.
The editor displays the following text (this example is a Vim screen):
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: README
# modified: benchmarks.rb
˜
˜
˜
".git/COMMIT_EDITMSG" 10L, 283C
You can see that the default commit message contains the latest output of the git status command commented out and one empty line on top. You can remove these comments and type your commit message, or you can leave them there to help you remember what you’re committing. (For a nevenmore explicit reminder of what you’ve modified, you can pass the -v option to git commit. Doing so also puts the diff of your change in the editor so you can see exactly what you did.) When you exit the editor, Git creates your commit with that commit message (with the comments and diff stripped out).
Alternatively, you can type your commit message inline with the commit command by specifying it after a -m flag, like this:
$ git commit -m "Story 182: Fix benchmarks for speed"
[master]: created 463dc4f: "Fix benchmarks for speed"
2 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 README
Now you’ve created your first commit! You can see that the commit has given you some output about itself: which branch you committed to (master), what SHA–1 checksumthecommithas(463dc4f),howmanyfileswerechanged,andstatisticsabout lines added and removed in the commit.
Remember that the commit records the snapshot you set up in your staging area. Anything you didn’t stage is still sitting there modified; you can do another commit to add it to your history. Every time you perform a commit, you’re recording a snapshot of your project that you can revert to or compare to later.
Skipping the Staging Area
Although it can be amazingly useful for crafting commits exactly how you want them, the staging area is sometimes a bit more complex than you need in your workflow. If you want to skip the staging area, Git provides a simple shortcut. Providing the -a option to the git commit command makes Git automatically stage every file that is already tracked before doing the commit, letting you skip the git add part:
$ git status
# On branch master
#
# Changed but not updated:
#
# modified: benchmarks.rb
#
$ git commit -a -m 'added new benchmarks'
[master 83e38c7] added new benchmarks
1 files changed, 5 insertions(+),
0 deletions(-)
Notice how you don’t have to run git add on the benchmarks.rb file in this case before you commit.
RemovingFiles
To remove a file from Git, you have to remove it from your tracked files (more accurately, remove it from your staging area) and then commit. The git rm command does that and also removes the file from your working directory so you don’t see it as an untracked file next time around.
If you simply remove the file from your working directory, it shows up under the "Changed but not updated" (that is, unstaged) area of your git status output:
$ rm grit.gemspec $ git status
# On branch master
#
# Changed but not updated:
# (use "git add/rm <file>..." to update what will be committed)
#
# deleted: grit.gemspec
#
Then, if you run git rm, it stages the file’s removal:
$ git rm grit.gemspec rm 'grit.gemspec' $ git status
# On branch master
#
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# deleted: grit.gemspec
#
The next time you commit, the file will be gone and no longer tracked. If you modified the file and added it to the index already,you must force the removal with the -f option. This is a safety feature to prevent accidental removal of data that hasn’t yet been recorded in a snapshot and that can’t be recovered from Git.
Another useful thing you may want to do is to keep the file in your working tree but remove it from your staging area. In other words, you may want to keep the file on your hard drive but not have Git track it anymore. This is particularly useful if you forgot to add something to your .gitignore file and accidentally added it, like a large log file or a bunch of .a compiled files. To do this, use the --cached option:
$ git rm --cached readme.txt
You can pass files, directories, and file-glob patterns to the git rm command. That means you can do things such as
$ git rm log/\*.log
Note the backslash () in front of the *. This is necessary because Git does its own file name expansion in addition to your shell’s file name expansion. This command removes all files that have the .log extension in the log/ directory. Or, you can do something like this:
$ git rm \*.txt
This command removes all files that end with .txt
Moving Files
Unlike many other VCS systems, Git doesn’t explicitly track file movement. If you rename a file in Git, no metadata is stored in Git that tells it you renamed the file. However, Git is pretty smart about figuring that out after the fact — we’ll deal with detecting file movement a bit later. Thus it’s a bit confusing that Git has a mv command. If you want to rename a file in Git, you can run something like
$ git mv file_from file_to
and it works fine. In fact, if you run something like this and look at the status, you’ll see that Git considers it a renamed file:
$ git mv README.txt README $ git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# renamed: README.txt -> README #
However, this is equivalent to running something like this:
Git figures out that it’s are name implicitly, so it doesn’t matter if you rename a file that way or with the mv command. The only real difference is that mv is one command instead of three — it’s a convenience function. More important, you can use any tool you like to rename a file, and address the add/rm later, before you commit.
Viewing the Commit History
After you have created several commits, or if you have cloned a repository with an existing commit history, you’ll probably want to look back to see what has happened. The most basic and powerful tool to do this is the git log command.
These examples use a very simple project called simplegit that I often use for demonstrations. To get the project, run
When you run git log in this project, you should get output that looks something like this:
$ git log
commit ca82a6dff817ec66f44342007202690a93763949
Author: Hoang Thanh Son <[email protected]>
Date: Mon Mar 17 21:52:11 2008 -0700
changed the verison number
commit 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Author: Hoang Thanh Son <[email protected]>
Date: Sat Mar 15 16:40:33 2008 -0700
removed unnecessary test code
commit a11bef06a3f659402fe7563abf99ad00de2209e6
Author: Hoang Thanh Son <[email protected]>
Date: Sat Mar 15 10:31:28 2008 -0700
first commit
By default, with no arguments, git log lists the commits made in that repository in reverse chronological order. That is, the most recent commits show up first. As you cansee, this command lists each commit with its SHA–1 checksum, the author’s name and e-mail, the date written, and the commit message. A huge number and variety of options to the git log command are available to show you exactly what you’re looking for. Here, we’ll show you some of the mostused options. One of the more helpful options is -p, which shows the diff introduced in each commit. You can also use -2, which limits the output to only the last two entries:
$ git log p -2
commit ca82a6dff817ec66f44342007202690a93763949
Author: Hoang Thanh Son <[email protected]>
Date: Mon Mar 17 21:52:11 2008 -0700
changed the verison number
diff --git a/Rakefile b/Rakefile
index a874b73..8f94139 100644
--- a/Rakefile
+++ b/Rakefile
@@ -5,7 +5,7 @@ require 'rake/gempackagetask'
spec = Gem::Specification.new do |s|
- s.version = "0.1.0"
+ s.version = "0.1.1"
s.author = "Hoang Thanh Son"
commit 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Author: Hoang Thanh Son <[email protected]>
Date: Sat Mar 15 16:40:33 2008 -0700
removed unnecessary test code
diff --git a/lib/simplegit.rb b/lib/simplegit.rb
index a0a60ae..47c6340 100644
--- a/lib/simplegit.rb
+++ b/lib/simplegit.rb @@ -18,8 +18,3 @@ class SimpleGit
end
end
-if $0 == __FILE__
- git = SimpleGit.new
- puts git.show
-end
\ No newline at end of file
This option displays the same information but with a diff directly following each entry. This is very helpful for code review or to quickly browse what happened during a series of commits that a collaborator has added. You can also use a series of summarizing options with git log. For example, if you want to see some abbreviated stats for each commit, you can use the --stat option:
$ git log --stat
commit ca82a6dff817ec66f44342007202690a93763949
Author: Hoang Thanh Son <[email protected]>
Date: Mon Mar 17 21:52:11 2008 -0700
changed the verison number
Rakefile | 2 +1
files changed, 1 insertions(+), 1 deletions(-)
commit 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Author: Hoang Thanh Son <[email protected]>
Date: Sat Mar 15 16:40:33 2008 -0700
removed unnecessary test code
lib/simplegit.rb | 5 ----
1 files changed, 0 insertions(+), 5 deletions(-)
commit a11bef06a3f659402fe7563abf99ad00de2209e6
Author: Hoang Thanh Son <[email protected]>
Date: Sat Mar 15 10:31:28 2008 -0700
first commit
README | 6 ++++++
Rakefile | 23 +++++++++++++++++++++++
lib/simplegit.rb | 25 +++++++++++++++++++++++++
3 files changed, 54 insertions(+), 0 deletions(-)
As you can see,the –stat option prints below each commit entry a list of modified files, how many files were changed, and how many lines in those files were added and removed. It also puts a summary of the information at the end. Another really useful option is –pretty. This option changes the log output to formats other than the default. A few prebuilt options are available for you to use. The oneline option prints each commit on a single line, which is useful if you’re looking at a lot of commits. In addition, the short, full, and fuller options show the output in roughly the same format but with less or more information, respectively:
$ git log --pretty=oneline
ca82a6dff817ec66f44342007202690a93763949 changed the verison number
085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7 removed unnecessary test code
a11bef06a3f659402fe7563abf99ad00de2209e6 first commit
The most interesting option is format, which allows you to specify your own log output format. This is especially useful when you’re generating output for machine parsing — because you specify the format explicitly, you know it won’t change with updates to Git:
$ git log --pretty=format:"%h - %an, %ar : %s"
ca82a6d - Hoang Thanh Son, 11 months ago : changed the verison number
085bb3b - Hoang Thanh Son, 11 months ago : removed unnecessary test code
a11bef0 - Hoang Thanh Son, 11 months ago : first commit
Option
Description of Output
%H
Commit hash
%h
Abbreviated commit hash
%T
Tree hash
%t
Abbreviated tree hash
%P
Parent hashes
%p
Abbreviated parent hashes
%an
Author name
%ae
Author e-mail
%ad
Author date (format respects the date= option)
%ar
Author date, relative
%cn
Committer name
%ce
Committer email
%cd
Committer date
%cr
Committer date, relative
%s
Subject
Table 2.1 Lists some of the more useful options that format takes.
You may be wondering what the difference is between author and committer. The author is the person who originally wrote the work,where as the committer is the person who last applied the work. So, if you send in a patch to a project and one of the core members applies the patch, both of you get credit — you as the author and the core member as the committer. We’ll cover this distinction a bit more in Chapter 5.
The oneline and format options are particularly useful with another log option called –graph. This option adds a nice little ASCII graph showing your branch and merge history, which we can see our copy of the Grit project repository:
$ git log --pretty=format:"%h %s" --graph
* 2d3acf9 ignore errors from SIGCHLD on trap
* 5e3ee11 Merge branch 'master' of git://github.com/dustin/grit
|\
| * 420eac9 Added a method for getting the current branch.
* | 30e367c timeout code and tests
* | 5a09431 add timeout protection to grit
* | e1193f8 support for heads with slashes in them
|/
* d6016bc require time for xmlschema
* 11d191e Merge branch 'defunkt' into local
Those are only some simple output-formatting options to git log — there are many more.
Option
Description
-p
Show the patch introduced with each commit.
–stat
Show statistics for files modified in each commit.
–shortstat
Display only the changed /insertions/deletions line from the –stat command.
–name-only
Show the list of files modified after the commit information.
–name-status
Show the list of files affected with added/modified/deleted information as well.
–abbrev-commit
Show only the first few characters of the SHA-1 checksum instead of all 40.
–relative-date
Display the date in a relative format (forexample,"2weeks ago") instead of using the full date format.
–graph
Display an ASCII graph of the branch and merge history beside the log output.
–pretty
Show commits in an alternate format. Options include one line, short, full, fuller, and format (where you specify your own format).
Table 2.2 Lists the options we’ve covereds of a randsome other common formatting options that may be useful, along with how they change the output of the log command.
Limiting Log Output
In addition to output-formatting options, git log takes a number of useful limiting options — that is, options that let you show only a subset of commits. You’ve seen one such option already — the -2 option, which show only the last two commits. In fact, you can do -<n>, where n is any integer to show the last n commits. In reality, you’re unlikely to use that often, because Git by default pipes all output through a pager so you see only one page of log output at a time.
However, the time-limiting options such as --since and --until are very useful. For example, this command gets the list of commits made in the last two weeks:
$ git log --since=2.weeks
This command works with lots of formats — you can specify a specific date ("2008– 01–15") or a relative date such as "2 years 1 day 3 minutes ago". You can also filter the list to commits that match some search criteria. The --author option allows you to filter on a specific author, and the --grep option lets you search for keywords in the commit messages. (Note that if you want to specify both author and grep options, you have to add --all-match or the command will match commits with either.) The last really useful option to pass to git log as a filter is a path. If you specify a directoryorfilename,you can limit the log output to commits that introduced a change to those files. This is always the last option an disgenerally preceded by double dashes (--) to separate the paths from the options. In Table 2.3 we’ll list these and a few other common options for your reference.
Option
Description
-(n)
Show only the last n commits
–since, –after
Limit the commits to those made after the specified date.
–until, –before
Limit the commits to those made before the specified date.
–author
Only show commits in which the author entry matches the specified string.
–committer
Only show commits in which the committer entry matches the specified string.
For example,if you want to see which commits modifying test files in the Git source code history were committed by Junio Hamano and were not merges in the month of October 2008, you can run something like this:
Of the nearly 20,000 commits in the Git source code history, this command shows the 6 that match those criteria.
Using a GUI to Visualize History
If you like to use a more graphical tool to visualize your commit history, you may want to take a look at a Tcl/Tk program called gitk that is distributed with Git. Gitk is basically a visual git log tool, and it accepts nearly all the filtering options that git log does. If you type gitk on the command line in your project, you should see something like Figure 2.2.
Figure 2.2: The gitk history visualizer
You can see the commit history in the top half of the window along with a nice ancestry graph. The diff viewer in the bottom half of the window shows you the changes introduced at any commit you click.
UndoingThings
At any stage, you may want to undo something. Here,we’ll review a few basic tools for undoing changes that you’ve made. Be careful, because you can’t always undo some of these undos. This is one of the few areas in Git where you may lose some work if you do it wrong.
Changing Your Last Commit
One of the common undos takes place when you commit too early and possibly forget to add some files, or you messup your commitmessage. If you want to try that commit again, you can run commit with the –amend option:
$ git commit --amend
This command takes your staging area and uses it for the commit. If you’ve have made no changes since your last commit (for instance, you run this command it im../mediately after your previous commit), then your snapshot will look exactly the same and all you’ll change is your commit message.
The same commit-message editor fires up, but it already contains the message of your previous commit. You can edit the message the same as always, but it overwrites your previous commit.
As an example, if you commit and then realize you forgot to stage the changes in a
file you wanted to add to this commit, you can do something like this:
All three of these commands end up with a single commit — the second command replaces the results of the first.
Unstaginga Staged File
The next two sections demonstrate how to wrangle your staging area and working directory changes. The nice part is that the command you use to determine the state of those two areas also reminds you how to undo changes to them. For example, let’s say you’ve changed two files and want to commit them as two separate changes, but you accidentally type git add * and stage them both. How can you unstage one of the two? The git status command reminds you:
$ git add . $ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: README.txt
# modified: benchmarks.rb #
Right below the "Changes to be committed" text, it says use git reset HEAD <file>... to unstage. So, let’s use that advice to unstage the benchmarks.rb file:
$ git reset HEAD benchmarks.rb benchmarks.rb: locally modified $ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: README.txt
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: benchmarks.rb #
The command is a bit strange, but it works. The benchmarks.rb file is modified but once again unstaged.
Unmodifying a Modified File
What if you realize that you don’t want to keep your changes to the benchmarks.rb file? How can you easily unmodify it — revert it back to what it looked like when you last committed (or initially cloned, or however you got it into your working directory)? Luckily, git status tells you how to do that, too. In the last example output, the unstaged area looks like this:
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: benchmarks.rb #
It tells you pretty explicitly how to discard the changes you’ve made (at least, the newer versions of Git, 1.6.1 and later, do this — if you have an older version, we highly recommend upgrading it to get some of these nicer usability features). Let’s do what it says:
$ git checkout -- benchmarks.rb $ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: README.txt #
You can see that the changes have been reverted. You should also realize that this is
a dangerous command: any changes you made to that file are gone — you just copied another file over it. Don’t ever use this command unless you absolutely know that you don’t want the file. If you just need to get it out of the way, we’ll go over stashing and branching in the next chapter; these are generally better ways to go. Remember,any thing that is committed in Git can almost always be recovered. Even commits that were on branches that were deleted or commits that were overwritten with an --amend commit can be recovered (see Chapter 9 for data recovery). However, anything you lose that was never committed is likely never to be seen again.
WorkingwithRemotes
To be able to collaborate on any Git project, you need to know how to manage your remote repositories. Remote repositories are versions of your project that are hosted on the Internet or network somewhere. You can have several of them, each of which generally is either read-only or read/write for you. Collaborating with others involves managing these remote repositories and pushing and pulling data to and from them when you need to share work. Managing remote repositories includes knowing how to add remote repositories, remove remotes that are no longer valid, manage various remote branches and define them as being tracked or not, and more. In this section, we’ll cover these remote-management skills.
Showing Your Remotes
To see which remote servers you have configured, you can run the git remote command. It lists the shortnames of each remote handle you’ve specified. If you’ve cloned your repository, you should at least see origin — that is the default name Git gives to the server you cloned from:
This means we can pull contributions from any of these users pretty easily. But notice that only the origin remote is an SSH URL, so it’s the only one I can push to (we’ll cover why this is in Chapter 4).
Adding Remote Repositories
I’ve mentioned and given some demonstrations of adding remote repositories in previous sections, but here is how to do it explicitly. To add a new remote Git repository as a shortname you can reference easily, run git remote add [shortname] [url]:
Now you can use the string pb on the command line in lieu of the whole URL. For example, if you want to fetch all the information that Paul has but that you don’t yet have in your repository, you can run git fetch pb:
Paul’s master branch is accessible locally as pb/master — you can merge it into one of your branches, or you can check out a local branch at that point if you want to inspect it.
Fetching and Pulling from Your Remotes
As you just saw, to get data from your remote projects, you can run
$ git fetch [remote-name]
The command goes out to that remote project and pulls down all the data from that remote project that you don’t have yet. After you do this, you should have references to all the branches from that remote, which you can merge in or inspect at any time. (We’ll go over what branches are and how to use them in much more detail in Chapter 3.)
If you cloned a repository, the command automatically adds that remote repository under the name origin. So, git fetch origin fetches any new work that has been pushed to that server since you cloned (or last fetched from) it. It’s important to note that the fetch command pulls the data to your local repository — it doesn’t automatically merge it with any of your work or modify what you’re currently working on. You have to merge it manually into your work when you’re ready.
If you have a branch set up to track a remote branch (see the next section and Chapter 3 for more information), you can use the git pull command to automatically fetch and then merge a remote branch into your current branch. This may be an easier or more comfortable workflow for you; and by default, the git clone command automatically sets up your local master branch to track the remote master branch on the server you cloned from (assuming the remote has a master branch). Running git pull generally fetches data from the server you originally cloned from and automatically tries to merge it into the code you’re currently working on
Pushing to Your Remotes
When you have your project at a point that you want to share, you have to push it upstream. The command for this is simple: git push [remote-name] [branch-name]. If you want to push your master branch to your origin server (again, cloning generally sets up both of those names for you automatically), then you can run this to push your work back up to the server:
$ git push origin master
This command works only if you cloned from a server to which you have write access and if nobody has pushed in the meantime. If you and someone else clone at the same time and they push upstream and then you push upstream, your push will rightly be rejected. You’ll have to pull down their work first and incorporate it into yours before you’ll be allowed to push. See Chapter 3 for more detailed information on how to push to remote servers.
Inspecting a Remote
If you want to see more information about a particular remote, you can use the git remote show [remote-name] command. If you run this command with a particular shortname, such as origin, you get something like this:
$ git remote show origin
* remote origin
URL: git://github.com/wingadium1/ticgit.git
Remote branch merged with 'git pull' while on branch master
master
Tracked remote branches
master
ticgit
It lists the URL for the remote repository as well as the tracking branch information. The command helpfully tells you that if you’re on the master branch and you run git pull, it will automatically merge in the master branch on the remote after it fetches all
the remote references. It also lists all the remote references it has pulled down.
That is a simple example you’re likely to encounter. When you’re using Git more heavily, however, you may see much more information from git remote show:
$ git remote show origin
* remote origin
URL: [email protected]:defunkt/github.git
Remote branch merged with 'git pull' while on branch issues
issues
Remote branch merged with 'git pull' while on branch master
master
New remote branches (next fetch will store in remotes/origin)
caching
Stale tracking branches (use 'git remote prune')
libwalker
walker2
Tracked remote branches
acl
apiv2
dashboard2
issues
master
postgres
Local branch pushed with 'git push'
master:master
This command shows which branch is automatically pushed when you run git push on certain branches. It also shows you which remote branches on the server you don’t yet have, which remote branches you have that have been removed from the server, and multiple branches that are automatically merged when you run git pull.
Removing and Renaming Remotes
If you want to rename a reference, in newer versions of Git you can run git remote rename to change a remote’s shortname. For instance, if you want to rename pb to paul, you can do so with git remote rename:
$ git remote rename pb paul
$ git remote
origin
paul
It’s worth mentioning that this changes your remote branch names, too. What used to be referenced at pb/master is now at paul/master. If you want to remove a reference for some reason — you’ve moved the server or are no longer using a particular mirror, or perhaps a contributor isn’t contributing anymore — you can use git remote rm:
$ git remote rm paul
$ git remote
origin
Tagging
Like most VCSs, Git has the ability to tag specific points in history as being important. Generally, people use this functionality to mark release points (v1.0, and so on). In this you’ll learn how to list the available tags, how to create new tags, and what the different types of tags are.
Listing Your Tags
Listing the available tags in Git is straightforward. Just type git tag:
$ git tag
v0.1
v1.3
This command lists the tags in alphabetical order; the order in which they appear has no real importance. You can also search for tags with a particular pattern. The Git source repo, for instance, contains more than 240 tags. If you’re only interested in looking at the 1.4.2 series, you can run this:
$ git tag -l 'v1.4.2.*'
v1.4.2.1
v1.4.2.2
v1.4.2.3
v1.4.2.4
Creating Tags
Git uses two main types of tags: lightweight and annotated. A lightweight tag is very much like a branch that doesn’t change — it’s just a pointer to a specific commit. Annotated tags, however, are stored as full objects in the Git database. They’re checksummed; contain the tagger name, e-mail, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG). It’s generally recommended that you create annotated tags so you can have all this information; but if you want a temporary tag or for some reason don’t want to keep the other information, lightweight tags are available too.
Annotated Tags
Creating an annotated tag in Git is simple. The easiest way is to specify -a when you run the tag command:
$ git tag -a v1.4 -m 'my version 1.4'
$ git tag v0.1 v1.3 v1.4
The -m specifies a tagging message, which is stored with the tag. If you don’t specify a message for an annotated tag, Git launches your editor so you can type it in. You can see the tag data along with the commit that was tagged by using the git show command:
$ git show v1.4
tag v1.4
Tagger: Hoang Thanh Son <[email protected]>
Date: Mon Feb 9 14:45:11 2009 -0800
my version 1.4
commit 15027957951b64cf874c3557a0f3547bd83b3ff6
Merge: 4a447f7... a6b4c97...
Author: Hoang Thanh Son <[email protected]>
Date: Sun Feb 8 19:02:46 2009 -0800
Merge branch 'experiment'
That shows the tagger information, the date the commit was tagged, and the annotation message before showing the commit information.
SignedTags
You can also sign your tags with GPG, assuming you have a private key. All you have to do is use -s instead of -a:
$ git tag -s v1.5 -m 'my signed 1.5 tag'
You need a passphrase to unlock the secret key for user: "Hoang Thanh Son <[email protected]>"
1024-bit DSA key, ID F721C45A, created 2009-02-09
If you run git show on that tag, you can see your GPG signature attached to it:
$ git show v1.5
tag v1.5
Tagger: Hoang Thanh Son <[email protected]>
Date: Mon Feb 9 15:22:20 2009 -0800
my signed 1.5 tag
-----BEGIN PGP SIGNATURE----
Version: GnuPG v1.4.8 (Darwin)
iEYEABECAAYFAkmQurIACgkQON3DxfchxFr5cACeIMN+ZxLKggJQf0QYiQBwgySN Ki0An2JeAVUCAiJ7Ox6ZEtK+NvZAj82/=WryJ
-----END PGP SIGNATURE----
commit 15027957951b64cf874c3557a0f3547bd83b3ff6
Merge: 4a447f7... a6b4c97...
Author: Hoang Thanh Son <[email protected]>
Date: Sun Feb 8 19:02:46 2009 -0800
Merge branch 'experiment'
A bit later, you’ll learn how to verify signed tags.
Light weight Tags
Another way to tag commits is with a lightweight tag. This is basically the commit checksum stored in a file — no other information is kept. To create a lightweight tag, don’t supply the -a, -s, or -m option:
$ git tag v1.4-lw
$ git tag
v0.1
v1.3
v1.4
v1.4-lw
v1.5
This time, if you run git show on the tag, you don’t see the extra tag information. The command just shows the commit:
$ git show v1.4-lw
commit 15027957951b64cf874c3557a0f3547bd83b3ff6
Merge: 4a447f7... a6b4c97...
Author: Hoang Thanh Son <[email protected]>
Date: Sun Feb 8 19:02:46 2009 -0800
Merge branch 'experiment'
Verifying Tags
To verify a signed tag, you use git tag -v [tag-name]. This command uses GPG to verify the signature. You need the signer’s public key in your keyring for this to work properly:
$ git tag -v v1.4.2.1
object 883653babd8ee7ea23e6a5c392bb739348b1eb61
type commit
tag v1.4.2.1
tagger Junio C Hamano <[email protected]> 1158138501 -0700
GIT 1.4.2.1
Minor fixes since 1.4.2, including git-mv and git-http with alternates.
gpg: Signature made Wed Sep 13 02:08:25 2006 PDT using DSA key ID F3119B9A
gpg: Good signature from "Junio C Hamano <[email protected]>"
gpg: aka "[jpeg image of size 1513]"
Primary key fingerprint: 3565 2A26 2040 E066 C9A7 4A7D C0C6 D9A4 F311 9B9A
If you don’t have the signer’s public key, you get something like this instead:
gpg: Signature made Wed Sep 13 02:08:25 2006 PDT using DSA key ID F3119B9A
gpg: Can't check signature: public key not found
error: could not verify the tag 'v1.4.2.1'
Tagging Later
You can also tag commits after you’ve moved past them. Suppose your commit history looks like this:
$ git log --pretty=oneline
15027957951b64cf874c3557a0f3547bd83b3ff6 Merge branch 'experiment'
a6b4c97498bd301d84096da251c98a07c7723e65 beginning write support
0d52aaab4479697da7686c15f77a3d64d9165190 one more thing
6d52a271eda8725415634dd79daabbc4d9b6008e Merge branch 'experiment'
0b7434d86859cc7b8c3d5e1dddfed66ff742fcbc added a commit function
4682c3261057305bdd616e23b64b0857d832627b added a todo file
166ae0c4d3f420721acbb115cc33848dfcc2121a started write support
9fceb02d0ae598e95dc970b74767f19372d61af8 updated rakefile
964f16d36dfccde844893cac5b347e7b3d44abbc commit the todo
8a5cbc430f1a9c3d00faaeffd07798508422908a updated readme
Now, suppose you forgot to tag the project at v1.2, which was at the "updated rakefile" commit. You can add it after the fact. To tag that commit, you specify the commit checksum (or part of it) at the end of the command:
$ git tag -a v1.2 9fceb02
You can see that you’ve tagged the commit:
$ git tag
v0.1
v1.2
v1.3
v1.4
v1.4-lw
v1.5
$ git show v1.2
tag v1.2
Tagger: Hoang Thanh Son <[email protected]>
Date: Mon Feb 9 15:32:16 2009 -0800
version 1.2
commit 9fceb02d0ae598e95dc970b74767f19372d61af8
Author: Magnus Chacon <[email protected]>
Date: Sun Apr 27 20:43:35 2008 -0700
updated rakefile
...
Sharing Tags
By default, the git push command doesn’t transfer tags to remote servers. You will have to explicitly push tags to a shared server after you have created them. This process is just like sharing remote branches you can run git push origin [tagname].
If you have a lot of tags that you want to push up at once, you can also use the --tags option to the git push command. This will transfer all of your tags to the remote server that are not already there.
Now, when someone else clones or pulls from your repository, they will get all your tags as well.
Tips and Tricks
Before we finish this chapter on basic Git, a few little tips and tricks may make your Git experience a bit simpler, easier, or more familiar. Many people use Git without using any of these tips, and we won’t refer to them or assume you’ve used them later in the book; but you should probably know how to do them.
Auto-Completion
If you use the Bash shell, Git comes with a nice auto-completion script you can enable. Download the Git source code, and look in the contrib/completion directory; there should be a file called git-completion.bash. Copy this file to your home directory, and add this to your .bashrc file:
source ˜/.git-completion.bash
If you want to set up Git to automatically have Bash shell completion for all users, copy this script to the /opt/local/etc/bash_completion.d directory on Mac systems or to the /etc/bash_completion.d/ directory on Linux systems. This is a directory of scripts that Bash will automatically load to provide shell completions.
If you’re using Windows with Git Bash, which is the default when installing Git on Windows with msysGit, auto-completion should be preconfigured.
Press the Tab key when you’re writing a Git command, and it should return a set of suggestions for you to pick from:
$ git co<tab><tab>
commit config
In this case, typing git co and then pressing the Tab key twice suggests commit and config. Adding m<tab> completes git commit automatically.
This also works with options, which is probably more useful. For instance, if you’re running a git log command and can’t remember one of the options, you can start typing it and press Tab to see what matches:
That’s a pretty nice trick and may save you some time and documentation reading.
Git Aliases
Git doesn’t infer your command if you type it in partially. If you don’t want to type the entire text of each of the Git commands, you can easily set up an alias for each command using git config. Here are a couple of examples you may want to set up:
This means that, for example, instead of typing git commit, you just need to type git ci. As you go on using Git, you’ll probably use other commands frequently as well; in this case, don’t hesitate to create new aliases.
This technique can also be very useful in creating commands that you think should exist. For example, to correct the usability problem you encountered with unstaging a file, you can add your own unstage alias to Git:
$ git config --global alias.unstage 'reset HEAD --'
This makes the following two commands equivalent:
$ git unstage fileA $ git reset HEAD fileA
This seems a bit clearer. It’s also common to add a last command, like this:
$ git config --global alias.last 'log -1 HEAD'
This way, you can see the last commit easily:
$ git
last commit 66938dae3329c7aebe598c2246a8e6af90d04646
Author: Josh Goebel <[email protected]>
Date: Tue Aug 26 19:48:51 2008 +0800
test for current head
Signed-off-by: Hoang Thanh Son <[email protected]>
As you can tell, Git simply replaces the new command with whatever you alias it for. However, maybe you want to run an external command, rather than a Git subcommand. In that case, you start the command with a ! character. This is useful if you write your own tools that work with a Git repository. We can demonstrate by aliasing git visual to run gitk:
$ git config --global alias.visual "!gitk"
Summary
At this point, you can do all the basic local Git operations — creating or cloning a repository, making changes, staging and committing those changes, and viewing the history of all the changes the repository has been through. Next, we’ll cover Git’s killer feature: its branching model.
This chapter will be about getting started with Git. We will begin at the beginning by explaining some background on version control tools, then move on to how to get Git running on your system and finally how to get it setup to start working with. At the end of this chapter you should understand why Git is around, why you should use it and you should be all setup to do so.
About Version Control
What is version control, and why should you care? Version control is a system that records changes to a file or set of files overtime so that you can recall specific versions later. For the examples in this book you will use software source code as the files being version controlled, though in reality you can do this with nearly any type of file on a computer.
If you are a graphic or web designer and want to keep every version of an image or layout (which you would most certainly want to), a Version Control System (VCS) is a very wise thing to use. It allows you to revert files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using a VCS also generally means that if you screw things up or lose files, you can easily recover. In addition, you get all this for very little overhead.
Local Version Control Systems
Many people’s version-control method of choice is to copy files into another directory (perhaps a time-stamped directory, if they’re clever). This approach is very common because it is so simple, but it is also incredibly error prone. It is easy to forget which directory you’re in and accidentally write to the wrong file or copy over files you don’t mean to.
To deal with this issue, programmers long ago developed local VCSs that had a simple database that kept all the changes to files under revision control (see Figure 1.1).
Figure 1.1 Local version control diagram
One of the more popular VCS tools was a system called rcs, which is still distributed with many computers today. Even the popular Mac OS X operating system includes the rcs command when you install the Developer Tools. This tool basically works by keeping patch sets (that is, the differences between files) from one change to another in a special format on disk; it can then re-create what any file looked like at any point in time by adding up all the patches.
Centralized Version Control Systems
The next major issue that people encounter is that they need to collaborate with developers on other systems. To deal with this problem, Centralized Version Control Systems (CVCSs) were developed. These systems, such as CVS, Subversion and Perforce, have a single server that contains all the versioned files, and a number of clients that check out files from that central place. For many years, this has been the standard for version control (see Figure 1.2).
Figure 1.2 Centralized version control diagram
This setup offers many advantages, especially over local VCSs. For example, everyone knows to a certain degree what everyone else on the project is doing. Administrators have fine-grained control over who can do what; and it’s far easier to administer a CVCS than it is to deal with local databases on every client.
However, this setup also has some serious downsides. The most obvious is the single point of failure that the centralized server represents. If that server goes down for an hour, then during that hour nobody can collaborate at all or save versioned changes to anything they’re working on. If the hard disk the central database is on becomes corrupted, and proper backups haven’t been kept, you lose absolutely everything—the entire history of the project except what ever single snapshots people happen to have on their local machines. Local VCS systems suffer from this same problem—whenever you have the entire history of the project in a single place, you risk losing everything.
Distributed Version Control Systems
This is where Distributed Version Control Systems (DVCSs) step in. In a DVCS such as Git, Mercurial, Bazaar or Darcs), clients don’t just check out the latest snapshot of the files: they fully mirror the repository. Thus if any server dies, and these systems were collaborating via it, any of the client repositories can be copied back up to the server to restore it. Every checkout is really a full backup of all the data (see Figure 1.3).
Figure 1.3 Distributed version control diagram
Furthermore, many of these systems deal pretty well with having several remote repositories they can work with, so you can collaborate with different groups of people in different ways simultaneously with in the same project. This allows you to setup several types of workflows that aren’t possible in centralized systems, such as hierarchical models.
A Short History of Git
As with many great things in life, Git began with a bit of creative destruction and fiery controversy. The Linux kernel is an open source software project of fairly large scope. For most of the lifetime of the Linux kernel maintenance (1991-2002), changes to the software were passed around as patches and archived files.
In 2002, the Linux kernel project began using a proprietary DVCS system called
BitKeeper.
In 2005, the relationship between the community that developed the Linux kernel and the commercial company that developed BitKeeper broke down, and the tool’s free-of-charge status was revoked. This prompted the Linux development community (and in particular Linus Torvalds, the creator of Linux) to develop their own tool based on some of the lessons they learned while using BitKeeper. Some of the goals of the new system were as follows:
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Since its birth in 2005, Git has evolved and matured to be easy to use and yet
retain these initial qualities. It’s incredibly fast, it’s very efficient with
large projects, and it has an incredible branching system for non-linear
development (See Chapter 3).
Git Basics
So, what is Git in a nutshell? This is an important section to absorb, because if you understand what Git is and the fundamentals of how it works, then using Git effectively will probably be much easier for you. As you learn Git, try to clear your mind of the things you may know about other VCSs, such as Subversion and Perforce; doing so will help you avoid subtle confusion when using the tool. Git stores and thinks about information much differently than these other systems, even though the user interface is fairly similar; understanding those differences will help prevent you from becoming confused while using it.
Snapshots, Not Differences
The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time, as illustrated in Figure 1.4.
Figure 1.4 Other systems tend to store data as changes to a base version of each file.
Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To bee fficient, if files have not changed, Git doesn’t store the file again—just a link to the previous identical file it has already stored. Git thinks about its data more like Figure 1.5.
Figure 1.5 Git stores data as snapshots of the project over time.
This is an important distinction between Git and nearly all other VCSs. It makes Git reconsider almost every aspect of version control that most other systems copied from the previous generation. This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS. We’ll explore some of the benefits you gain by thinking of your data this way when we cover Git branching in Chapter 3.
Nearly Every Operation Is Local
Most operations in Git only need local files and resources to operate generally no information is needed from another computer on your network. If you’re used to a CVCS where most operations have that network latency overhead, this aspect of Git will make you think that the gods of speed have blessed Git with unworldly powers. Because you have the entire history of the project right there on your local disk, most operations seem almost instantaneous.
For example, to browse the history of the project, Git doesn’t need to go out to the server to get the history and display it for you—it simply reads it directly from your local database. This means you see the project history almost instantly. If you want to see the changes introduced between the current version of a file and the file a month ago, Git can look up the file a month ago and do a local difference calculation, instead of having to either ask a remote server to do it or pull an older version of the file from the remote server to do it locally.
This also means that there is very little you can’t do if you’re offline or off VPN. If you get on an airplane or a train and want to do a little work, you can commit happily until you get to a network connection to upload. If you go home and can’t get your VPN client working properly, you can still work. In many other systems, doing so is either impossible or painful. In Perforce, for example, you can’t do much when you aren’t connected to the server; and in Subversion and CVS, you can edit files, but you can’t commit changes to your database (because your database is offline). This may not seem like a huge deal, but you may be surprised what a big difference it can make.
Git Has Integrity
Everything in Git is check-summed before it is stored and is then referred to by that checksum. This means it’s impossible to change the contents of any file or directory without Git knowing about it. This functionality is built into Git at the lowest levels and is integral to its philosophy. You can’t lose information in transit or get file corruption without Git being able to detect it. The mechanism that Git uses for this check summing is called a SHA–1 hash. This is a 40-character string composed of hexadecimal characters (0-9 and a-f) and calculated based on the contents of a file or directory structure in Git. A SHA–1 hash looks something like this:
24b9da6552252987aa493b52f8696cd6d3b00373
You will see these hash values all over the place in Git because it uses them so much. In fact, Git stores everything not by file name but in the Git database addressable by the hash value of its contents.
Git Generally Only Adds Data
When you do actions in Git, nearly all of them only add data to the Git database. It is very difficult to get the system to do anything that is not undoable or to make it erase data in any way. As in any VCS, you can lose or mess up changes you haven’t committed yet; but after you commit a snapshot into Git, it is very difficult to lose, especially if you regularly push your database to another repository.
This makes using Git a joy because we know we can experiment without the danger of severely screwing things up. For a more in-depth look at how Git stores its data and how you can recover data that seems lost, see "Under the Covers" in Chapter 9.
The Three States
Now, pay attention. This is the main thing to remember about Git if you want the rest of your learning process to go smoothly. Git has three main states that your files can reside in: committed, modified, and staged. Committed means that the data is safely stored in your local database. Modified means that you have changed the file but have not committed it to your database yet. Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
This leads us to the three main sections of a Git project: the Git directory, the working directory, and the staging area.
Figure 1.6 Working directory, staging area, and git directory
The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.
The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.
The staging area is a simple file, generally contained in your Git directory, that stores information about what will go into your next commit. It’s sometimes referred to as the index, but it’s becoming standard to refer to it as the staging area.
The basic Git workflow goes something like this:
You modify files in your working directory.
You stage the files, adding snapshots of them to your staging area.
You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.
If a particular version of a file is in the git directory, it’s considered committed. If it’s modified but has been added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified. In Chapter 2, you’ll learn more about these states and how you can either take advantage of them or skip the staged part entirely.
Installing Git
Let’s get into using some Git. First things first—you have to install it. You can get it a number of ways; the two major ones are to install it from source or to install an existing package for your platform.
Installing from Source
Some people may instead find it useful to install Git from source, because you’ll get the most recent version. The binary installers tend to be a bit behind, though as Git has matured in recent years, this has made less of a difference.
If you do want to install Git from source, you need to have the following libraries that Git depends on: autotools, curl, zlib, openssl, expat, and libiconv. For example, if you’re on a system that has dnf (such as Fedora) or apt-get (such as a Debian-based system), you can use one of these commands to install the minimal dependencies for compiling and installing the Git binaries:
$ sudo dnf install dh-autoreconf curl-devel expat-devel gettext-devel
openssl-devel perl-devel zlib-devel
$ sudo apt-get install dh-autoreconf libcurl4-gnutls-dev libexpat1-dev gettext
libz-dev libssl-dev
In order to be able to add the documentation in various formats (doc, html,
info), these additional dependencies are required:
$ sudo dnf install asciidoc xmlto docbook2X
$ sudo apt-get install asciidoc xmlto docbook2x
Note
Users of RHEL and RHEL-derivatives like CentOS and Scientific Linux will have to enable the EPEL repository to download the docbook2X package.
Installing on Linux
If you want to install Git on Linux via a binary installer, you can generally do so through the basic package-management tool that comes with your distribution. If you’re on Fedora, you can use yum:
$ yum install git-core
Or if you’re on a Debian-based distribution like Ubuntu, try apt-get:
The other major way is to install Git via Brew (https://brew.sh/).
You don’t have to add all the extras, but you’ll probably want to include +svn in case you ever have to use Git with Subversion repositories (see Chapter 8).
Installing on Windows
There are also a few ways to install Git on Windows. The most official build is available for download on the Git website. Just go to https://git-scm.com/download/win and the download will start automatically. Note that this is a project called Git for Windows, which is separate from Git itself; for more information on it, go to https://gitforwindows.org.
To get an automated installation you can use the Git Chocolatey package. Note that the Chocolatey package is community maintained.
Another easy way to get Git installed is by installing GitHub Desktop. The installer includes a command line version of Git as well as the GUI. It also works well with PowerShell, and sets up solid credential caching and sane CRLF settings. We’ll learn more about those things a little later, but suffice it to say they’re things you want. You can download this from the GitHub Desktop website.
First-Time Git Setup
Now that you have Git on your system, you’ll want to do a few things to customize your Git environment. You should have to do these things only once; they’ll stick around between upgrades. You can also change them at any time by running through the commands again.
Git comes with a tool called git config that lets you get and set configuration variables that control all aspects of how Git looks and operates. These variables can be stored in three different places:
/etc/gitconfig
file: Contains values for every user on the system and all their repositories. If you pass the option–system to git config, it reads and writes from this file specifically.
.gitconfig
file: Specific to your user. You can make Git read and write to this file specifically by passing the --global
option.
config file in the git directory (that is, .git/config) of whatever repository you’re currently using: Specific to that single repository. Each level overrides values in the previous level, so values in .git/config
trump those in /etc/gitconfig.
On Windows systems, Git looks for the .gitconfig file in the $HOME
directory (C:\Documents and Settings\$USER
for most people). It also still looks for /etc/gitconfig, although it’s relative to the MSys root, which is wherever you decide to install Git on your Windows system when you run the installer.
Your Identity
The first thing you should do when you install Git is to set your user name and e-mail address. This is important because every Git commit uses this information, and it’s immutably baked into the commits you pass around:
Again, you need to do this only once if you pass the --global
option, because then Git will always use that information for anything you do on that system. If you want to override this with a different name or e-mail address for specific projects, you can run the command without the --global
option when you’re in that project.
Your Editor
Now that your identity is set up, you can configure the default text editor that will be used when Git needs you to type in a message. By default, Git uses your system’s default editor, which is generally Vi or Vim. If you want to use a different text editor, such as Emacs, you can do the following:
$ git config --global core.editor emacs
Your Diff Tool
Another useful option you may want to configure is the default diff tool to use to resolve merge conflicts. Say you want to use vim diff:
$ git config --global merge.tool vimdiff
Git accepts kdiff3, tkdiff, meld, xxdiff, emerge, vimdiff, gvimdiff, ecmerge, and opendiff as valid merge tools. You can also set up a custom tool; see Chapter 7 for more information about doing that.
Checking Your Settings
If you want to check your settings, you can use the git config --list command to list all the settings Git can find at that point:
You may see keys more than once, because Git reads the same key from different files (/etc/gitconfig
and .gitconfig, for example). In this case, Git uses the last value for each unique key it sees. You can also check what Git thinks a specific key’s value is by typing git config key:
$ git config user.name Hoang Thanh Son
Getting Help
If you ever need help while using Git, there are three ways to get the manual page (manpage) help for any of the Git commands:
$ git help <verb>
$ git <verb> --help
$ man git-<verb>
For example, you can get the manpage help for the config command by running
$ git help config
These commands are nice because you can access them anywhere, even offline. If the man pages and this book aren’t enough and you need in-person help, you can try the #git
or #github
channel on the Freenode IRC server(irc.freenode.net). These channels are regularly filled with hundreds of people who are all very knowledgeable about Git and are often willing to help.
Summary
You should have a basic understanding of what Git is and how it’s different from the CVCS you may have been using. You should also now have a working version of Git on your system that’s set up with your personal identity. It’s now time to learn some Git basics.
If you can read only one chapter to get going with Git, this is it. This chapter covers every basic command you need to do the vast majority of the things you’ll eventually spend your time doing with Git. By the end of the chapter, you should be able to configure and initialize a repository, begin and stop tracking files, and stage and commit changes. We ‘ll also show you how to setup Git to ignore certain files and file patterns, how to undo mistakes quickly and easily, how to browse the history of your project and view changes between commits, and how to push and pull from remote repositories.